Created
April 13, 2023 08:55
-
-
Save betatim/66b6ee6a780ec2e1c54f653eff198d9d to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.linear_model import SGDClassifier | |
from sklearn.datasets import make_classification | |
import numpy as np | |
X, y = make_classification(n_features=5, random_state=42) | |
rng = np.random.RandomState(10) | |
# According to the docs https://scikit-learn.org/stable/common_pitfalls.html#id2 | |
# in particular the subsection on cloning for estimators these two estimators | |
# should influence each other. They share the `rng` instance. | |
sgd = SGDClassifier(random_state=rng) | |
sgd2 = clone(sgd) | |
# However the fitted coefs are the same for both `sgd` and `sgd2` | |
# This is surprising. I would have expected them to be different. | |
# With the variation being similar to what you see when you call | |
# sgd.fit(X, y).coef_ multiple times in a row. | |
print(sgd.fit(X, y).coef_) | |
print(sgd2.fit(X, y).coef_) |
I agree they aren't the same, but I think they should be.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The RNG instances are not the same, which you can check with: