Difference between BeautifulSoup and Scrapy crawler?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Techno Intrigue Looping

--

Chapters
00:00 Difference Between Beautifulsoup And Scrapy Crawler?
00:22 Answer 1 Score 21
00:57 Accepted Answer Score 282
01:37 Answer 3 Score 2
01:58 Answer 4 Score 3
02:43 Thank you

--

Full question
https://stackoverflow.com/questions/1968...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #beautifulsoup #scrapy #webcrawler

#avk47

ACCEPTED ANSWER

Score 282

Scrapy is a Web-spider or web scraper framework, You give Scrapy a root URL to start crawling, then you can specify constraints on how many (number of) URLs you want to crawl and fetch,etc. It is a complete framework for web-scraping or crawling.

While

BeautifulSoup is a parsing library which also does a pretty good job of fetching contents from URL and allows you to parse certain parts of them without any hassle. It only fetches the contents of the URL that you give and then stops. It does not crawl unless you manually put it inside an infinite loop with certain criteria.

In simple words, with Beautiful Soup you can build something similar to Scrapy. Beautiful Soup is a library while Scrapy is a complete framework.

Source

ANSWER 2

Score 21

I think both are good... im doing a project right now that use both. First i scrap all the pages using scrapy and save that on a mongodb collection using their pipelines, also downloading the images that exists on the page. After that i use BeautifulSoup4 to make a pos-processing where i must change attributes values and get some special tags.

If you don't know which pages products you want, a good tool will be scrapy since you can use their crawlers to run all amazon/ebay website looking for the products without making a explicit for loop.

Take a look at the scrapy documentation, it's very simple to use.

ANSWER 3

Score 3

Both are using to parse data.

Scrapy:

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
But it has some limitations when data comes from java script or loading dynamicaly, we can over come it by using packages like splash, selenium etc.

BeautifulSoup:

Beautiful Soup is a Python library for pulling data out of HTML and XML files.
we can use this package for getting data from java script or dynamically loading pages.

Scrapy with BeautifulSoup is one of the best combo we can work with for scraping static and dynamic contents

ANSWER 4

Score 2

The way I do it is to use the eBay/Amazon API's rather than scrapy, and then parse the results using BeautifulSoup.

The APIs gives you an official way of getting the same data that you would have got from scrapy crawler, with no need to worry about hiding your identity, mess about with proxies,etc.