The Python Oracle

scikit-learn OpenMP libsvm

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Meadow

--

Chapters
00:00 Scikit-Learn Openmp Libsvm
00:33 Accepted Answer Score 8
01:31 Answer 2 Score 2
02:33 Answer 3 Score 2
03:21 Thank you

--

Full question
https://stackoverflow.com/questions/1327...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #openmp #libsvm #scikitlearn

#avk47



ACCEPTED ANSWER

Score 8


There is no OpenMP support in the current binding for libsvm in scikit-learn. However it is very likely that if you have performance issues with sklearn.svm.SVC should you use a more scalable model instead.

If your data is high dimensional it might be linearly separable. In that case it is advised to first try simpler models such as naive bayes models or sklearn.linear_model.Perceptron that are known to be very speedy to train. You can also try sklearn.linear_model.LogisticRegression and sklearn.svm.LinearSVC both implemented using liblinear that is more scalable than libsvm albeit less memory efficients than other linear models in scikit-learn.

If your data is not linearly separable, you can try sklearn.ensemble.ExtraTreesClassifier (adjust the n_estimators parameter to trade-off training speed vs. predictive accuracy).

Alternatively you can try to approximate a RBF kernel using the RBFSampler transformer of scikit-learn + fitting a linear model on the output:

http://scikit-learn.org/dev/modules/kernel_approximation.html




ANSWER 2

Score 2


If you are using cross validation or grid search in scikit-learn then you can use multiple CPUs with the n_jobs parameter:

GridSearchCV(..., n_jobs=-1)
cross_val_score(..., n_jobs=-1)

Note that cross_val_score only needs a job per forld so if your number of folds is less than your CPUs you still won't be using all of your processing power.

LibSVM can use OpenMP if you can compile it and use it directly as per these instructions in the LibSVM FAQ. So you could export your scaled data in LibSVM format (here's a StackOverflow question on how to do that) and use LibSVM directly to train your data. But that will only be of benefit if you're grid searching or wanting to know accuracy scores, as far as I know the model LibSVM creates cannot be used in scikit-learn.

There is also a GPU accelerated version of LibSVM which I have tried and is extremely fast, but is not based on the current LibSVM version. I have talked to the developers and they say they hope to release a new version soon.




ANSWER 3

Score 2


Although this thread is a year+ old, I thought it is worth answering.

I wrote a patch for openmp support on scikit-learn for both libsvm and liblinear (linearSVC) that's available here - https://github.com/fidlr/sklearn-openmp.

It is based on libsvm's FAQ on how to add OpenMP support, and the multi-core implementation of liblinear.

Just clone the repo and run sklearn-build-openmp.sh to apply the patch and build it.

Timing OMP_NUM_THREADS=4 python plot_permutation_test_for_classification.py:

  • svmlib with linear kernel timinig dropped by a factor of 2.3
  • RBF kernel - same.
  • Liblinear with 4 thread dropped by x1.6

Details about and usage information can be found here - http://fidlr.org/post/137303264732/scikit-learn-017-with-libsvm-openmp-support