Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.3k views
in Technique[技术] by (71.8m points)

python - Scrapy - Change closing reason from "finished" to "myReason"

I would like to achieve the following:

  1. Run spider until finished
  2. Count scraped items
  3. If number_of_items > x: reason=finished (nothing to be done)
  4. If number_of_items <= x: reason=insufficient_number (change reason accordingly)

The first two parts are fine. However, I'm struggling with the last two steps, as I'm not sure how I can set the value manually. I tried so far the code below.

import scrapy

class MySpider(scrapy.Spider):                                                         
     start_urls = ['https://example.com']

     def start_requests(self):
        yield scrapy.Request(url=self.start_urls[0], callback=self.parse)

     def close(self, spider, reason):
        # here I want to change the reason.
        # I tried to change spider.crawler.stats.get_stats()['finish_reason'],
        # however this only changes the stats (of course),
        # but not the value in INFO: Closing spider (finished).

     def parse(self, response):
        ...

Thanks for your support on this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...