Signals
Scrapy uses signals extensively to notify when certain actions occur. You can
catch some of those signals in your Scrapy project or extension to perform
additional tasks or extend Scrapy to add functionality not provided out of the
box.
Even though signals provide several arguments, the handlers which catch them
don’t have to receive all of them.
For more information about working when see the documentation of
pydispatcher (library used to implement signals).
Built-in signals reference
Here’s a list of signals used in Scrapy and their meaning, in alphabetical
order.
engine_started
-
scrapy.core.signals.engine_started()
- Sent when the Scrapy engine is started (for example, when a crawling
process has started).
engine_stopped
-
scrapy.core.signals.engine_stopped()
- Sent when the Scrapy engine is stopped (for example, when a crawling
process has finished).
item_scraped
-
scrapy.core.signals.item_scraped(item, spider, response)
Sent when the engine receives a new scraped item from the spider, and right
before the item is sent to the Item Pipeline.
Parameters: |
- item (Item object) – is the item scraped
- spider (BaseSpider object) – the spider which scraped the item
- response (Response object) – the response from which the item was scraped
|
item_passed
-
scrapy.core.signals.item_passed(item, spider, output)
Sent after an item has passed all the Item Pipeline stages without
being dropped.
Parameters: |
- item (Item object) – the item which passed the pipeline
- spider (BaseSpider object) – the spider which scraped the item
- output – the output of the item pipeline. This is typically the
same Item object received in the item
parameter, unless some pipeline stage created a new item.
|
item_dropped
-
scrapy.core.signals.item_dropped(item, spider, exception)
Sent after an item has been dropped from the Item Pipeline
when some stage raised a DropItem exception.
Parameters: |
- item (Item object) – the item dropped from the Item Pipeline
- spider (BaseSpider object) – the spider which scraped the item
- exception (DropItem exception) – the exception (which must be a
DropItem subclass) which caused the item
to be dropped
|
spider_closed
-
scrapy.core.signals.spider_closed(spider, reason)
Sent after a spider has been closed. This can be used to release per-spider
resources reserved on spider_opened.
Parameters: |
- spider (BaseSpider object) – the spider which has been closed
- reason (str) – a string which describes the reason why the spider was closed. If
it was closed because the spider has completed scraping, it the reason
is 'finished'. Otherwise, if the spider was manually closed by
calling the close_spider engine method, then the reason is the one
passed in the reason argument of that method (which defaults to
'cancelled'). If the engine was shutdown (for example, by hitting
Ctrl-C to stop it) the reason will be 'shutdown'.
|
spider_opened
-
scrapy.core.signals.spider_opened(spider)
Sent after a spider has been opened for crawling. This is typically used to
reserve per-spider resources, but can be used for any task that needs to be
performed when a spider is opened.
Parameter: | spider (BaseSpider object) – the spider which has been opened |
spider_idle
-
scrapy.core.signals.spider_idle(spider)
Sent when a spider has gone idle, which means the spider has no further:
- requests waiting to be downloaded
- requests scheduled
- items being processed in the item pipeline
If the idle state persists after all handlers of this signal have finished,
the engine starts closing the spider. After the spider has finished
closing, the spider_closed signal is sent.
You can, for example, schedule some requests in your spider_idle
handler to prevent the spider from being closed.
Parameter: | spider (BaseSpider object) – the spider which has gone idle |
request_received
-
scrapy.core.signals.request_received(request, spider)
Sent when the engine receives a Request from a spider.
Parameters: |
- request (Request object) – the request received
- spider (BaseSpider object) – the spider which generated the request
|
request_uploaded
-
scrapy.core.signals.request_uploaded(request, spider)
Sent right after the download has sent a Request.
Parameters: |
- request (Request object) – the request uploaded/sent
- spider (BaseSpider object) – the spider which generated the request
|
response_received
-
scrapy.core.signals.response_received(response, spider)
Parameters: |
- response (Response object) – the response received
- spider (BaseSpider object) – the spider for which the response is intended
|
Sent when the engine receives a new Response from the
downloader.
response_downloaded
-
scrapy.core.signals.response_downloaded(response, spider)
Sent by the downloader right after a HTTPResponse is downloaded.
Parameters: |
- response (Response object) – the response downloaded
- spider (BaseSpider object) – the spider for which the response is intended
|