-
-
Save avidale/6668635c318aceebe0142de013a4cf77 to your computer and use it in GitHub Desktop.
@jfrasco can you give a specific example?
With the Boston dataset, your constraints work fine:
X, y = load_boston(return_X_y=True)
A = np.concatenate([-np.identity(X.shape[1]), np.identity(X.shape[1]), np.ones((1, X.shape[1])), -np.ones((1, X.shape[1]))])
B = np.concatenate((np.zeros(X.shape[1]), np.ones(X.shape[1]), np.array([1, 0])))
model = ConstrainedLinearRegression(
A=A,
B=B,
)
model.fit(X, y)
print(model.intercept_)
print(model.coef_)
print(model.coef_.sum())
produces
20.858240975103065
[0. 0.14213999 0. 0.85786001 0. 0.
0. 0. 0. 0. 0. 0.
0. ]
1.0
@avidale Does this work with generalized linear models like Lasso and Ridge?
Yes, it does. For Ridge adaptation is particularly easy, because Ridge uses exactly the same math as OLS - you need to change only a couple of code lines to add the L2 penalty.
With Lasso, however, the loss function is no longer quadratic, so the adaptation will be more messy. But it is still possible.
@jfrasco can you give a specific example?
With the Boston dataset, your constraints work fine:X, y = load_boston(return_X_y=True) A = np.concatenate([-np.identity(X.shape[1]), np.identity(X.shape[1]), np.ones((1, X.shape[1])), -np.ones((1, X.shape[1]))]) B = np.concatenate((np.zeros(X.shape[1]), np.ones(X.shape[1]), np.array([1, 0]))) model = ConstrainedLinearRegression( A=A, B=B, ) model.fit(X, y) print(model.intercept_) print(model.coef_) print(model.coef_.sum())
produces
20.858240975103065 [0. 0.14213999 0. 0.85786001 0. 0. 0. 0. 0. 0. 0. 0. 0. ] 1.0
I can definitely confirm that it works on the Boston data as I just tried it, but I can't get it to work on my data. Not sure how to share my data, but I essentially have 4 columns of 36 months of monthly returns for my X, and one column of 36 monthly returns for my y.
I get the following from a normal LinearRegression with no constraints:
-0.0004900024100277615
[0.18638477 0.2053274 0.07690057 0.64233321]
1.1109459483489401
Using the above, I get this:
-0.002232462269444445
[1. 0. 0. 0.]
1.0
Which is clearly wrong!
Here's my X:
array([[ 6.76395100e-03, 1.05127870e-02, 1.38131920e-02,
8.91216900e-03],
[ 3.11930700e-03, -1.18833000e-04, -5.56112100e-03,
-1.40301670e-02],
[ 1.22898000e-02, 1.48354070e-02, 1.76495830e-02,
1.46773540e-02],
[ 8.20155400e-03, 1.03513650e-02, 9.93534000e-03,
1.05970970e-02],
[ 5.84342900e-03, 5.10907300e-03, 7.89939000e-03,
1.37747080e-02],
[-1.82564300e-03, -1.21698960e-02, -2.41190460e-02,
-2.72551080e-02],
[ 2.49304000e-05, -3.81050000e-03, -7.14523700e-03,
-1.21222690e-02],
[ 2.20000000e-02, 3.61578590e-02, 5.27356990e-02,
6.43562460e-02],
[ 1.04000000e-02, 1.26722980e-02, 1.50119400e-02,
2.13071580e-02],
[ 1.13000000e-02, 1.55012380e-02, 2.93593550e-02,
5.01564280e-02],
[ 4.31893300e-03, 8.90650800e-03, 1.62781490e-02,
2.58799160e-02],
[-4.10587000e-05, -3.94980900e-03, -5.08411500e-03,
-9.01910500e-03],
[-2.98000000e-04, -5.12142700e-03, -1.46469070e-02,
-2.54150210e-02],
[ 3.04000000e-03, -3.39805700e-03, -9.74336200e-03,
-1.52240690e-02],
[ 9.17000000e-03, 1.30487420e-02, 2.33631520e-02,
2.75515380e-02],
[ 1.67858300e-03, -6.15091800e-03, -1.59148770e-02,
-2.34790290e-02],
[ 7.63690800e-03, 8.28053100e-03, 5.80860500e-03,
7.27193000e-04],
[ 1.28940800e-02, -2.54326700e-03, -1.46813370e-02,
-2.31787980e-02],
[ 1.77329260e-02, 2.27723310e-02, 3.30301960e-02,
4.45734430e-02],
[ 1.02051920e-02, 1.15040150e-02, 9.76096600e-03,
4.02489100e-03],
[ 6.66632800e-03, 1.91852410e-02, 2.52234550e-02,
3.31616260e-02],
[ 4.08310700e-03, 4.35669700e-03, -6.53384000e-04,
-7.84317600e-03],
[ 7.79128400e-03, 9.49587000e-03, 1.26496340e-02,
1.46050810e-02],
[ 8.28355200e-03, 7.66034500e-03, 9.65023700e-03,
6.15344600e-03],
[ 8.79511300e-03, 1.07991700e-02, 8.79172400e-03,
4.14202900e-03],
[ 9.26750300e-03, 1.95014440e-02, 2.70481780e-02,
2.18226380e-02],
[ 5.34313000e-04, -4.78709000e-04, -4.36123200e-03,
-1.08921810e-02],
[ 7.35792400e-03, 1.13625030e-02, 1.09614140e-02,
8.92494200e-03],
[ 4.28727900e-03, 1.61191700e-03, 2.38737700e-03,
7.29546800e-03],
[ 5.54352700e-03, 3.69901700e-03, -4.00165900e-03,
-5.07037900e-03],
[ 4.55277800e-03, 9.19459000e-03, 8.14547700e-03,
-2.54201800e-03],
[ 5.42147000e-03, 1.34941860e-02, 1.90196850e-02,
2.05033210e-02],
[ 9.42743600e-03, 1.94949930e-02, 1.10851060e-02,
8.66207300e-03],
[ 7.85459900e-03, 1.68533700e-03, 8.06887000e-04,
-2.86475900e-03],
[-5.04015600e-03, -7.08023830e-02, -1.09417595e-01,
-1.16886912e-01],
[ 4.09176120e-02, 7.53775410e-02, 5.31236080e-02,
1.80014280e-02]])
Here's my y:
array([ 0.00981752, -0.00945781, 0.01503509, 0.01026975, 0.01106711,
-0.02300287, -0.00972671, 0.05655607, 0.01859124, 0.03931067,
0.02070886, -0.00702471, -0.01954932, -0.01169865, 0.02418464,
-0.01865414, 0.00299641, -0.0170793 , 0.0386599 , 0.00638496,
0.02901831, -0.00440659, 0.01331681, 0.00749591, 0.00638459,
0.02274299, -0.00743834, 0.00976891, 0.00508093, -0.00349541,
0.00221714, 0.01862392, 0.01185967, -0.00043719, -0.09747239,
0.03918195])
This is a really nice application for constraining the sums of coefficients, but can you do it together with individual beta constraints? For instance enforcing a bound on the sum as well as an upper bound on each beta.
@jfrasco yes, it seems that your data is a case when coordinate descent sucks, because the optimization algorithm is allowed to move only along the axes, and it gets stuck in a sharp corner of the restricted area.
One way of not completely solving, but mitigating this problem would be to decrease the "learning rate": instead of substracting grad[i] / hessian[i,i]
from beta[i]
, substract only its fraction on each step.
In deep learning, learning rate about 1e-4
is often good, and for your particular dataset it is good as well, producing the vector of coefficients [0.39777213 0.22207764 0.19722112 0.18292911]
.
I have updated the code to make the learning rate a tunable hyperparameter.
This solution is still far from optimal, so maybe you want to use more sophisticated algorithms, such as L-BFGS-B (it is implemented e.g. in scipy).
@avidale Thanks for the quick response. I will try out on other data I have to see how it works, but this should have been a simple case, because my y (The South African All Bond Index), is actually made up of the Xs (the individual components of the ALBI representing the different duration buckets).
I will investigate your suggestion above, but is this an alternative to what your method solves, as this is exactly what I am looking for?
@jfrasco yes, exactly, L-BFGS-B is a more sophisticated alternative to my method.
@davendw49 please look at the examples above
@davendw49 please look at the examples above
oops, i miss that part.
Hello,
I have used your code for my dataset and it works. Thanks for sharing this. I still needed a bit of help/guidance though. The sklearn linear regression also allows you to use weights (sample_weights). I wanted to know, how can I add sample weights to this code. If you could give me help with that. Use both specific constraints on each coefficient and also use specific weights for each sample. Thank you.
Hi, how to put both constraints together, min beta value should be 1 and sum of all the betas should be equal or less than a particular no
Hi, how to put both constraints together, min beta value should be 1 and sum of all the betas should be equal or less than a particular no
See my example above. -np.identity provides a single 1 for each beta under A, and you will then include the corresponding minimum value in B.
Its not working.
I am putting 2 conditions together:
- min beta value - 1
- sum of all beta <= a no
Please paste your code.
Can I also make intercept positive. if yes, please let me know how? I am using the same code, just making min_coef = 1
@avidale how is the coordinate descent computing the constraints? For example, if I used an interior point solver, it creates a barrier function for inequalities and then a newton line search is used.
EDIT: And what do you mean by "recalculating constraints for every coefficient on each step"?
Thank you very much for sharing. This is what I'm looking for. Is there a way to just constrain specific subset of features and leave everything else as is?
Brilliant solution and well explained. I'm late to the party here but how would one modify this to use weighted-least-squares with constraints?
This is brilliant, but I can't get it to work with individual constraints, in addition to the multiple constraints i.e. I've added individual constraints that all coefficients must be between 0 and 1, as well as the sum of all of them.
A=np.concatenate([-np.identity(X.shape[1]), np.identity(X.shape[1]), np.ones((1, X.shape[1])), -np.ones((1, X.shape[1]))])
B=np.concatenate((np.zeros(X.shape[1]), np.ones(X.shape[1]), np.array([1, 0])))