Notes on Chapter 2 of Tom Zimmerman's Dissertation

[Paper] (https://dash.harvard.edu/bitstream/handle/1/17467320/ZIMMERMANN-DISSERTATION-2015.pdf?sequence=1])

Intro: Econom(etr)ics vs. ML

Economics focused on empirical relationships between features and outcomes, ML focused on predicting outcomes.
Beta vs. yhat. cv.coeffs vs cv.metrics.fscore
TZ: Can test relationship by seeing if inclusion of variable in big model improves predictions, thereby avoiding omitted control issues.
requires ML approach (feature engineering) on investor behavior datasets!
implementation details and robustness checks more valuable than actual results on disposition effect.

Disposition Effect: Investors reluctant to realize losses

Odean (1998): investors reluctant to realize losses: more likely to sell when Price of Stock above purchase price.
TZ: if we add 844 features the marginal effect of adding the disposition feature is 0.
TZ: better variable is whether the purchase price is in the top quartile of recently seen prices
(see feature engineering section)

Data and Setup

160 K trades from 40K accounts, 6K tickers
avg. holding period = 103 days
Each trade is represented as an an array of days, starting at BUY date, where investor chooses to sell or hold.
Ignore client, ticker since so few trades per client, ticker.
Pairwise classification task: show model two X rows and ask it to predict which is a Sell (one is sell and one is hold).
guessing game setup to avoid label-imbalance issues
(more selling closer to buy date, once you've held for N days less likely to sell on day N+1)

Feature Engineering: Date Ranges and Transformers

Summary:

Each of the 844 features is a transformation of the price history of the traded security. The abstraction is 2 date range getters called by 5 transformers. The full set of transformers and date range getters makes 844 features. The Features can be loosely grouped into Recent price movement, Longer Term price movement, performance since BUY date. Whether the price is in the top quartile of prices since BUY date and last 2 day returns are the best features.

Date Range Getters: A(i, j), B(i, j)

A-ranges start from both ends of the price series and move towards the middle.
- Used for computing holding period features.
- A(i, j) => px.loc[i: t-j]
- uses 0 ≤ i < 10, 0 ≤ j < 10
B-ranges start at date t and go back in time (potentially past BUY date). They are used for computing px movement features over various horizons
- B(i, j) => px.loc[t-j: t-i]
- B(0, 1) => px.iloc[-2:]
A(i, j) returns on trade type features. Associated w dispo effect. 100 features.
B(i,j) where 0 ≤ i < 5, i + 1 ≤ j < 5. Recent price movement. 20 features.
B(i, j), where i = 0, j ∈ {20, 40, . . . , 200}. These define medium term to long-term price movements relative to t. 10 features.
Generates 120 domains.

Transformers:

Gain, Quartile, RefGain, AboveMin, and AboveMax which take price ranges as arguments and return {-1, 0, or 1}. (Note that I've renamed slightly)
Map each accr
844 features How do we get this?
Gain(A(0,0) = 1 if p_[t] > p_[0] else 0 => Odean's feature
then Lasso, Logit w L1 Loss, and Decision Trees as Classifiers.

[Important Features:] (https://www.dropbox.com/s/u53c66zpks6qs4j/Screenshot%202016-07-03%2019.22.25.png?dl=0)

Recent returns [last 3 days] and Quartile4(A(0,0)). IsMax creates small improvement.

Robustness Checks

Flip test: how do predictions change if we change feature values.
Predictions only change a lot by flipping Quartile4 and Trend1, Trend2.
What if there is a feature a lot like Gain(A(0,0)) that is doing the same thing: take out all Gain based features.
Betting Game: if you bet on predictions at odds implied by simple priors, how much would you make?

Results & Tables

[Results] (https://www.dropbox.com/s/tfexldq1ia7hpop/Screenshot%202016-07-03%2018.36.55.png?dl=0)
[Feature Importances] (https://www.dropbox.com/s/u53c66zpks6qs4j/Screenshot%202016-07-03%2019.22.25.png?dl=0)
Observing this pattern Quartile4=1 and 2 consescutive price increases, one should predict a selling decision for an average reward of 11.986.

Takeaways

Think about whether your setup creates selection (or other) bias.
50D Cross- style features seem to have worked well for TZ. (Quartile feature v similar)
Interesting reliance on binary variables, like Quartile4.
Quartile dummies!
Table Lookup algo for model inspection

Random Notes

Traditional deductive Economics: pick relationship you care about and sensible controls.
ML Approach: throw in kitchen sink and cross-validate.
ML can handle situations where K features > N samples.
Movement in price (either direction) associated w selling.

RefGain Transformer

iteratively define Ref Price(t, η) =η ∗ Ref Price(t − 1, η) + (1 − η) ∗ pt, where η is a parameter that adjusts the weight between the current prince and the past price. RefGain is equal to one if the current price exceeds the current RefPrice, and it is zero otherwise. We construct four such variables, operating on the A(0,0) domain for parameters η = 0.9, 0.99, 0.999, 0.9999.

sshleifer/zimmerman_chap2.md