Last active
December 30, 2017 17:09
-
-
Save bskinn/7497558b028d2b2c70f866236a92928d to your computer and use it in GitHub Desktop.
Old text file version of completed and pending/potential tasks for excel-mregress
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- TO DO -- | |
xxx Strip all Optional arguments, to ensure specificity of behavior? | |
===THESE MAY BE REDUNDANT=== | |
Parameterized predictor transforms, eventually to be optimized over, to find, e.g., the optimal exponent for a nonlinear dependence. | |
Workflow mini-language? Multi-step transforms from source (predictor transform->residual analysis->...) | |
Arbitrary, functional-form definitions of transforms? (<!predname!>, e.g.) | |
Combinatorial parameter cluster model selection (e.g., run model selection for all subsets of size m out of these n things, | |
and report comparative performance. | |
Implement some (optional?) relative datasource reference storage, so that rgns are more portable from machine to machine. | |
Given a regression against a set S of N predictors, containing one specific predictor P of interest, implement quick means for (at minimum) plotting (ordinate) the residuals of a fit against the N-1 predictors (S Ω !P) versus (abscissa) the predictor P. This would give an *approximate* visualization of the actual fitting value of O. HOWEVER: removing P from the fit DOES CHANGE the fitting behavior over the (S Ω !P) predictors, because that reduced fit no longer accounts for the effects of P as it participates in tandem with the (S Ω !P) other predictors! | |
===END=== | |
*** Implement capability to prompt for re-selection of source workbook when regression is moved | |
xxx *** Implement ability to select where to place model selection results worksheet | |
xxx *** PLOTTING: Implement option to include an identity line? | |
xxx Only really relevant for 'fitted response vs response' plots | |
Implement ability to prevent predictors from being downselected in automatic model selection? | |
*** Re-work / streamline model selection code to use .addFilter, .delFilter | |
"Generalize 'XDataCheck' .... not sure what I meant when writing this?" | |
Implement retaining form positioning at least for Main Menu and Plot forms, maybe Filter Predictors, too. | |
Add # of outliers at a given alpha (set alpha? tie to plot form? one or more standard alphas?) | |
to the main form rgn summary? | |
It seems that sometimes it's desirable to clear the filters when Editing; sometimes not. Need to | |
enable selective filter clear(?) | |
ABSOLUTELY need to harden against formula-error cases (collinearity, etc. - unless/until such | |
things are dealt with on a deeper level) | |
Consider a more elegant check/correction in reg.modifyRegression for the prior charting status, | |
perhaps basically just bumping abscissa and/or ordinate to a default if either/both is presently | |
set to a predictor. Current always-force-to-default-charting is strongly cautious; who knows | |
if this caution is warranted? | |
Check first cell of each range of loaded source and compare to stored prior value | |
as a relatively robust check against changes to the source book | |
If change(s) detected, prompt user and recommend confirming source references | |
are correct | |
Will probably have to handle differently in different load circumstances? | |
OR -- could hash the dataset somehow | |
Robustify config section of Reg load to proofread against empty cells or invalid values | |
-- Apply defaults if bad settings are detected | |
Automate and robustify placement of plot trendline datalabel to avoid covering data points | |
Add v[ii] as a plottable variable? | |
Add checkbox option for whether to include the linfit datalabel | |
#### ADD CHECK TO ENSURE DELETING A REGRESSION DOESN'T PULL THE SOURCE OUT FROM UNDER | |
#### A CURRENTLY OPEN REGRESSION! Add to delete and close functions | |
Possibly add a check and notification to the Edit and Filter functions, to advise user | |
that the underlying data of an open Reg has been changed if the Edited/Filtered reg | |
is the source for that other Reg | |
Change initial folder of fd on NameEntry form if already defined within the Regression | |
-- OR, consider adding a 'currentBrowseFolder' to RegressMain: Remember last browsed folder | |
for opening regressions on the main form | |
Custom sigfigs on charts -- linfit datapoint, normalization factors in axis labels | |
Tab order on the Chart form | |
Plots of, or against, unfiltered predictors? | |
Plots including vs excluding filtered points | |
Would want to color based on filtered/unfiltered points | |
Wouldn't be able to plot fit parameters (residuals, etc.) for filtered points, though, since they wouldn't exist | |
for the filtered-out points | |
Have to trap for if a Regression returns bad numbers / errors / etc. | |
***Implement GUI filtering of data points | |
***Automated polynomials/interactions capability | |
Partial analysis: Auto-gen of new Regression that calculates residuals against a given predictor for the response and all other *active* predictors | |
(Y(xi) vs {Xj!=i(xi)} | |
***Automatic generation of factored datasets -- new SOURCE books with residuals of the response and of | |
a subset of predictors against another subset of predictors. (ability to see on a chart how the response | |
actually depends on a subset of parameters (incl. a single parameter), with the effects of other predictors | |
'factored out') | |
***'bubble plots' -- color-coded residual/Cook-distance as a function of two variables | |
(two predictors, one predictor+response?) | |
***Filter/unfilter outliers determined by studentized analysis, or in a range of Cook's distances, or such | |
Covariances? Separate sheet generation, only if called for. | |
Probably an entire new menu | |
What are cov's diagnostic for? Collinear predictors? | |
Collinearity diagnostics? (adapt 'CheckXData' function into the form code) | |
Proof predictor and response data ranges to ensure all values contained are numeric, every time source | |
defined or refreshed | |
***Check for whether source book exists when loading Regressions; implement graceful handling if absent | |
Do Range.CurrentRegion checking to ensure result sheet is properly structured/sized | |
Hide the internal working sheets? | |
***Some manner of interactive plot exploration would be really nice (point values, case #'s, whatever) | |
- Check box to show case numbers or something? | |
Q-Q plots for comparing response, predictors, residuals, etc. to various distributions (normal & Weibull at least) | |
--Chi-squared test is a formal statistical test for normality -- include on these? Should be | |
~straightforward to calculate when generating the Q-Q plot | |
-- CUSTOM RUN-TIME ERRORS -- | |
1801 - object not initialized | |
-- TENTATIVE/UNCERTAIN -- | |
Sort data by any of the above? Not showing explicit data/results columns very much. | |
May run into problems with collinear predictors at some point... | |
Coerce unique names for predictors; should hopefully allow for unambiguous naming of, e.g., chart tabs to facilitate auto-reload of previously generated analyses. | |
Include ability to leave data as a link to source workbook, instead of | |
copying as values-only? | |
Risky -- much more prone to breakage; probably better to just require | |
rebuilding of the regression book | |
[as needed] Robustify repop of workbooks list after opening of new/closing of old, as required | |
[as needed] Robustify error handling in actually generating the regression workbook | |
-- Currently in development -- | |
-- COMPLETED -- | |
[DONE] Fixed bugs in model selection code when a reduced model contains only one predictor | |
[DONE] Fixed bug when model reduces to one predictor; accidentally was testing number of points instead | |
of number of predictors. | |
[DONE] Added omitted definitions of series names on model selection chart | |
[DONE] Added priority recall of currently associated save folder for a regression when re-defining it, falling back | |
to the last-used application-level folder if missing/invalid | |
[DONE] Added retention of last-used folder memory for opening source data workbooks | |
[DONE] Added 'AIC' model selection button to the code enable/disable helper function | |
***Below completed as of v1.1.0*** | |
[DONE] Implemented retention of last-used folder when opening/creating regressions | |
[DONE] Automated model selection -- construction of analysis based upon minimzation of AIC | |
Forward and backward; full factorial if model is small enough... how big is too big? | |
Some sort of criterion for summarily dropping a predictor? Threshold on beta p-value? | |
[DONE] Fixed edge case glitch when numerical precision limits result in diag(V) being >= 1 | |
[DONE] Added helper routine for manual checking of linear independence of datasets | |
[DONE] Implemented basics of corrected AIC from Hu, 2007 | |
***Below were completed as of v1.0.0*** | |
[DONE] Different plot sizes for different uses | |
- large, how it is now | |
- medium, suitable for documents or small images in presentations, still has axis labels &c | |
- small, no axis labels, only min/max of axis ranges; for thumbnails | |
Thumbnails still have multiple tick labels in initial implementation | |
[DONE] Fiddle with Enter key behavior on the NameEntry form | |
[DONE] Highlight outliers in some fashion, either in data or on plots | |
--Customizeable alpha | |
[DONE] Add various charting capabilities for acting upon generated workbook | |
Plot any of these versus any other: | |
predictor, response, fit-response, residual, studentized residual, t-stat, Cook distance. | |
Also, residuals (or anything) vs sequence number | |
[DONE] Include student-t for identifying potential outliers | |
[DONE] Will have to figure out how to reversibly filter data points and predictors | |
Some sort of unique ID on the data points would probably help... row number of original data? | |
Apply filter (delete rows/columns in filtered sheet) in reverse order for easy indexing | |
[DONE] Add capability to have multiple regression workbooks open at once, with | |
switching between them (likely means new form) | |
[DONE] Will have to have a unified 'recreate regression' function w/in the class in order to | |
have the revised results propagate after data points or predictors are filtered out | |
[DONE] Add ability to close workbooks | |
[DONE] Add ability to include/exclude constant from the regression | |
[DONE] Implement R^2 calculation | |
[DONE] Implement stderr calculations for beta parameters | |
[DONE] IMPLEMENT names for predictors... | |
[DONE] Add capability to automatically reconstruct the necessary references, etc. | |
for a saved regression workbook | |
- Recall source data | |
- Recall regression workbook (just SAVED!) | |
[DONE] F-statistics |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment