The Python Oracle

Combination of GridSearchCV's refit and scorer unclear

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Jungle Looping

--

Chapters
00:00 Question
01:09 Accepted answer (Score 9)
02:12 Thank you

--

Full question
https://stackoverflow.com/questions/5759...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #scikitlearn #gridsearch

#avk47



ACCEPTED ANSWER

Score 11


When refit=True, sklearn uses entire training set to refit the model. So, there is no test data left to estimate the performance using any scorer function.

If you use multiple scorer in GridSearchCV, maybe f1_score or precision along with your balanced_accuracy, sklearn needs to know which one of those scorer to use to find the "inner winner" as you say. For example with KNN, f1_score might have best result with K=5, but accuracy might be highest for K=10. There is no way for sklearn to know which value of hyper-parameter K is the best.

To resolve that, you can pass one string scorer to refit to specify which of those scorer should ultimately decide best hyper-parameter. This best value will then be used to retrain or refit the model using full dataset. So, when you've got just one scorer, as your case seems to be, you don't have to worry about this. Simply refit=True will suffice.