Combination of GridSearchCV's refit and scorer unclear

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: City Beneath the Waves Looping

--

Chapters
00:00 Combination Of Gridsearchcv'S Refit And Scorer Unclear
00:52 Accepted Answer Score 11
01:51 Thank you

--

Full question
https://stackoverflow.com/questions/5759...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #scikitlearn #gridsearch

#avk47

ACCEPTED ANSWER

Score 11

When refit=True, sklearn uses entire training set to refit the model. So, there is no test data left to estimate the performance using any scorer function.

If you use multiple scorer in GridSearchCV, maybe f1_score or precision along with your balanced_accuracy, sklearn needs to know which one of those scorer to use to find the "inner winner" as you say. For example with KNN, f1_score might have best result with K=5, but accuracy might be highest for K=10. There is no way for sklearn to know which value of hyper-parameter K is the best.

To resolve that, you can pass one string scorer to refit to specify which of those scorer should ultimately decide best hyper-parameter. This best value will then be used to retrain or refit the model using full dataset. So, when you've got just one scorer, as your case seems to be, you don't have to worry about this. Simply refit=True will suffice.