What does calling fit() multiple times on the same model do?

This video explains
What does calling fit() multiple times on the same model do?

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Over Ancient Waters Looping

--

Chapters
00:00 Question
00:57 Accepted answer (Score 79)
01:32 Answer 2 (Score 7)
02:44 Answer 3 (Score 1)
03:18 Thank you

--

Full question
https://stackoverflow.com/questions/4984...

Accepted answer links:
[estimators, supporting "Incremental learning" (those, that implement ]: https://scikit-learn.org/0.15/modules/sc...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #machinelearning #scikitlearn

#avk47

ACCEPTED ANSWER

Score 100

If you will execute model.fit(X_train, y_train) for a second time - it'll overwrite all previously fitted coefficients, weights, intercept (bias), etc.

If you want to fit just a portion of your data set and then to improve your model by fitting a new data, then you can use estimators, supporting "Incremental learning" (those, that implement partial_fit() method)

ANSWER 2

Score 8

You can use term fit() and train() word interchangeably in machine learning. Based on classification model you have instantiated, may be a clf = GBNaiveBayes() or clf = SVC(), your model uses specified machine learning technique.
And as soon as you call clf.fit(features_train, label_train) your model starts training using the features and labels that you have passed.

you can use clf.predict(features_test) to predict.
If you will again call clf.fit(features_train2, label_train2) it will start training again using passed data and will remove the previous results. Your model will reset the following inside model:

Weights
Fitted Coefficients
Bias
And other training related stuff...

You can use partial_fit() method as well if you want your previous calculated stuff to stay and additionally train using next data

ANSWER 3

Score 2

Beware that the model is passed kind of "by reference". Here, model1 will be overwritten:

df1 = pd.DataFrame(np.random.rand(100).reshape(10,10))
df2 = df1.copy()
df2.iloc[0,0] = df2.iloc[0,0] -2 # change one value

pca = PCA()
model1 = pca.fit(df)
model2 = pca.fit(df2)

np.unique(model1.explained_variance_ == model2.explained_variance_)

Returns

array([ True])

To avoid this use

from copy import deepcopy
model1 = deepcopy(pca.fit(df))