The Python Oracle

Does NLTK have TF-IDF implemented?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puddle Jumping Looping

--

Chapters
00:00 Does Nltk Have Tf-Idf Implemented?
00:35 Accepted Answer Score 10
00:52 Answer 2 Score 4
01:36 Thank you

--

Full question
https://stackoverflow.com/questions/2957...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #nlp #nltk #tfidf

#avk47



ACCEPTED ANSWER

Score 11


The NLTK TextCollection class has a method for computing the tf-idf of terms. The documentation is here, and the source is here. However, it says "may be slow to load", so using scikit-learn may be preferable.




ANSWER 2

Score 4


I guess, there are enough evidences to conclude non-existence of TF-IDF in NLTK:

  1. Unfortunately, calculating tf-idf is not available in NLTK so we'll use another data analysis library, scikit-learn

    from COMPSCI 290-01 Spring 2014 lab

  2. More important, source code contains nothing related to tfidf (or tf-idf). Exceptions are NLTK-contrib, which contains map-reduce implementation for TF-IDF.

There are several libs for tf-idf mentioned in related question.

Upd: search by tf idf or tf_idf lets to find the function already found by @yvespeirsman