Unconclusive RandomForest documentation in ScikitLearn

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Techno Bleepage Open

--

Chapters
00:00 Unconclusive Randomforest Documentation In Scikitlearn
01:06 Accepted Answer Score 3
01:59 Thank you

--

Full question
https://stackoverflow.com/questions/2841...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #scikitlearn #randomforest

#avk47

ACCEPTED ANSWER

Score 3

I agree the first quote is self-contradictory. Maybe the following would be better:

The best results are also often reached with fully developed trees (max_depth=None and min_samples_split=1). Bear in mind though that these values are usually not guaranteed to be optimal. The best parameter values should always be cross-validated.

For the second quote, it compares the default value of the bootstrap parameter for random forests (RandomForestClassifier and RandomForestRegression) to extremely randomized trees as implemented in the classes ExtraTreesClassifier and ExtraTreesRegressor. The following might be more explicit:

In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while for building extra-trees the default strategy is to use the original dataset (bootstrap=False).

Please feel free to submit a PR with the fix if you find those formulations clearer to understand.