Skip to content

Instantly share code, notes, and snippets.

@betatim
Created April 13, 2023 08:55
Show Gist options
  • Save betatim/66b6ee6a780ec2e1c54f653eff198d9d to your computer and use it in GitHub Desktop.
Save betatim/66b6ee6a780ec2e1c54f653eff198d9d to your computer and use it in GitHub Desktop.
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import make_classification
import numpy as np
X, y = make_classification(n_features=5, random_state=42)
rng = np.random.RandomState(10)
# According to the docs https://scikit-learn.org/stable/common_pitfalls.html#id2
# in particular the subsection on cloning for estimators these two estimators
# should influence each other. They share the `rng` instance.
sgd = SGDClassifier(random_state=rng)
sgd2 = clone(sgd)
# However the fitted coefs are the same for both `sgd` and `sgd2`
# This is surprising. I would have expected them to be different.
# With the variation being similar to what you see when you call
# sgd.fit(X, y).coef_ multiple times in a row.
print(sgd.fit(X, y).coef_)
print(sgd2.fit(X, y).coef_)
@adrinjalali
Copy link

The RNG instances are not the same, which you can check with:

# %%
import numpy as np
from sklearn.base import clone

# %%
rng = np.random.RandomState(10)
rng2 = clone(rng, safe=False)
id(rng) == id(rng2)

@betatim
Copy link
Author

betatim commented Apr 13, 2023

I agree they aren't the same, but I think they should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment