What does calling fit() multiple times on the same model do?
What does calling fit() multiple times on the same model do?
--
Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Over Ancient Waters Looping
--
Chapters
00:00 Question
00:57 Accepted answer (Score 79)
01:32 Answer 2 (Score 7)
02:44 Answer 3 (Score 1)
03:18 Thank you
--
Full question
https://stackoverflow.com/questions/4984...
Accepted answer links:
[estimators, supporting "Incremental learning" (those, that implement ]: https://scikit-learn.org/0.15/modules/sc...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #machinelearning #scikitlearn
#avk47
ACCEPTED ANSWER
Score 100
If you will execute model.fit(X_train, y_train) for a second time - it'll overwrite all previously fitted coefficients, weights, intercept (bias), etc.
If you want to fit just a portion of your data set and then to improve your model by fitting a new data, then you can use estimators, supporting "Incremental learning" (those, that implement partial_fit() method)
ANSWER 2
Score 8
You can use term fit() and train() word interchangeably in machine learning. Based on classification model you have instantiated, may be a clf = GBNaiveBayes() or clf = SVC(), your model uses specified machine learning technique.
And as soon as you call clf.fit(features_train, label_train) your model starts training using the features and labels that you have passed.
you can use clf.predict(features_test) to predict.
If you will again call clf.fit(features_train2, label_train2) it will start training again using passed data and will remove the previous results. Your model will reset the following inside model:
- Weights
- Fitted Coefficients
- Bias
- And other training related stuff...
You can use partial_fit() method as well if you want your previous calculated stuff to stay and additionally train using next data
ANSWER 3
Score 2
Beware that the model is passed kind of "by reference". Here, model1 will be overwritten:
df1 = pd.DataFrame(np.random.rand(100).reshape(10,10))
df2 = df1.copy()
df2.iloc[0,0] = df2.iloc[0,0] -2 # change one value
pca = PCA()
model1 = pca.fit(df)
model2 = pca.fit(df2)
np.unique(model1.explained_variance_ == model2.explained_variance_)
Returns
array([ True])
To avoid this use
from copy import deepcopy
model1 = deepcopy(pca.fit(df))