Model help using Scikit-learn when using GridSearch
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Secret Catacombs
--
Chapters
00:00 Model Help Using Scikit-Learn When Using Gridsearch
01:29 Accepted Answer Score 3
01:55 Answer 2 Score 14
03:15 Thank you
--
Full question
https://stackoverflow.com/questions/4236...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #machinelearning #scikitlearn #crossvalidation #gridsearch
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Secret Catacombs
--
Chapters
00:00 Model Help Using Scikit-Learn When Using Gridsearch
01:29 Accepted Answer Score 3
01:55 Answer 2 Score 14
03:15 Thank you
--
Full question
https://stackoverflow.com/questions/4236...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #machinelearning #scikitlearn #crossvalidation #gridsearch
#avk47
ANSWER 1
Score 14
GridSearchCV as @Gauthier Feuillen said is used to search best parameters of an estimator for given data. Description of GridSearchCV:-
gcv = GridSearchCV(pipe, clf_params,cv=cv)gcv.fit(features,labels)clf_paramswill be expanded to get all possible combinations separate using ParameterGrid.featureswill now be split intofeatures_trainandfeatures_testusingcv. Same forlabels- Now the gridSearch estimator (pipe) will be trained using
features_trainandlabels_innerand scored usingfeatures_testandlabels_test. - For each possible combination of parameters in step 3, The steps 4 and 5 will be repeated for
cv_iterations. The average of score across cv iterations will be calculated, which will be assigned to that parameter combination. This can be accessed usingcv_results_attribute of gridSearch. - For the parameters which give the best score, the internal estimator will be re initialized using those parameters and refit for the whole data supplied into it(features and labels).
Because of last step, you are getting different scores in first and second approach. Because in the first approach, all data is used for training and you are predicting for that data only. Second approach has prediction on previously unseen data.
ACCEPTED ANSWER
Score 3
Basically the grid search will:
- Try every combination of your parameter grid
- For each of them it will do a K-fold cross validation
- Select the best available.
So your second case is the good one. Otherwise you are actually predicting data that you trained with (which is not the case in the second option, there you only keep the best parameters from your gridsearch)