The application crashes when both use_session_pool=False and context.session.retire() are used. #888

matecsaj · 2025-01-09T02:17:39Z

Python code

import asyncio
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext   # V0.5.0

async def main() -> None:
    crawler = PlaywrightCrawler(use_session_pool=False)

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        context.session.retire()

    await crawler.run(['https://crawlee.dev'])

if __name__ == '__main__':
    asyncio.run(main())

command line output

/Users/matecsaj/PycharmProjects/wat-crawlee/venv/bin/python /Users/matecsaj/Library/Application Support/JetBrains/PyCharm2024.3/scratches/scratch_5.py 
[crawlee._autoscaling.snapshotter] INFO  Setting max_memory_size of this run to 8.00 GB.
[crawlee.crawlers._playwright._playwright_crawler] INFO  Current request statistics:
┌───────────────────────────────┬──────────┐
│ requests_finished             │ 0        │
│ requests_failed               │ 0        │
│ retry_histogram               │ [0]      │
│ request_avg_failed_duration   │ None     │
│ request_avg_finished_duration │ None     │
│ requests_finished_per_minute  │ 0        │
│ requests_failed_per_minute    │ 0        │
│ request_total_duration        │ 0.0      │
│ requests_total                │ 0        │
│ crawler_runtime               │ 0.000736 │
└───────────────────────────────┴──────────┘
[crawlee._autoscaling.autoscaled_pool] INFO  current_concurrency = 0; desired_concurrency = 2; cpu = 0.0; mem = 0.0; event_loop = 0.0; client_info = 0.0
[crawlee.crawlers._playwright._playwright_crawler] ERROR Request failed and reached maximum retries
      Traceback (most recent call last):
        File "/Users/matecsaj/PycharmProjects/wat-crawlee/venv/lib/python3.13/site-packages/crawlee/crawlers/_basic/_context_pipeline.py", line 79, in __call__
          await final_context_consumer(cast(TCrawlingContext, crawling_context))
        File "/Users/matecsaj/PycharmProjects/wat-crawlee/venv/lib/python3.13/site-packages/crawlee/router.py", line 57, in __call__
          return await self._default_handler(context)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/matecsaj/Library/Application Support/JetBrains/PyCharm2024.3/scratches/scratch_5.py", line 9, in request_handler
          context.session.retire()
          ^^^^^^^^^^^^^^^^^^^^^^
      AttributeError: 'NoneType' object has no attribute 'retire'
[crawlee._autoscaling.autoscaled_pool] INFO  Waiting for remaining tasks to finish
[crawlee.crawlers._playwright._playwright_crawler] INFO  Error analysis: total_errors=3 unique_errors=1
[crawlee.crawlers._playwright._playwright_crawler] INFO  Final request statistics:
┌───────────────────────────────┬───────────┐
│ requests_finished             │ 0         │
│ requests_failed               │ 1         │
│ retry_histogram               │ [0, 0, 1] │
│ request_avg_failed_duration   │ 0.705511  │
│ request_avg_finished_duration │ None      │
│ requests_finished_per_minute  │ 0         │
│ requests_failed_per_minute    │ 11        │
│ request_total_duration        │ 0.705511  │
│ requests_total                │ 1         │
│ crawler_runtime               │ 5.27278   │
└───────────────────────────────┴───────────┘

Process finished with exit code 0

The text was updated successfully, but these errors were encountered:

Mantisus · 2025-01-09T02:44:40Z

But the use_session_pool=False parameter disables the use of SessionPool. Therefore context.session is None and this behavior is as expected

matecsaj · 2025-01-09T03:31:11Z

Could the system handle this scenario more gracefully? Perhaps context.session.retire() could check for a None case and simply return if it encounters it.

If this scenario is something the user is expected to address, an error log with a clear explanation of the issue and guidance on how to resolve it would be helpful."

Mantisus · 2025-01-09T03:48:45Z

From my point of view, use_session_pool=False is not the default configuration. And a user who uses it makes a conscious choice that the system will not use sessions.

So this check may be needed either when experimenting with the code or in some complex cases that the user defines himself.

A simple case is

If context.session:
    context.session.retire()

B4nan · 2025-01-09T08:52:55Z

Correct, if you disable the session pool, there is no session in the context. That is also the reason why it's typed to Session | None. This is working as expected, so closing.

https://crawlee.dev/python/api/class/BasicCrawlingContext#session

github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 9, 2025

matecsaj mentioned this issue Jan 9, 2025

How to always use new sessions and proxies for requests? #881

Closed

B4nan closed this as completed Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The application crashes when both use_session_pool=False and context.session.retire() are used. #888

The application crashes when both use_session_pool=False and context.session.retire() are used. #888

matecsaj commented Jan 9, 2025

Mantisus commented Jan 9, 2025

matecsaj commented Jan 9, 2025

Mantisus commented Jan 9, 2025 •

edited

Loading

B4nan commented Jan 9, 2025

The application crashes when both use_session_pool=False and context.session.retire() are used. #888

The application crashes when both use_session_pool=False and context.session.retire() are used. #888

Comments

matecsaj commented Jan 9, 2025

Mantisus commented Jan 9, 2025

matecsaj commented Jan 9, 2025

Mantisus commented Jan 9, 2025 • edited Loading

B4nan commented Jan 9, 2025

Mantisus commented Jan 9, 2025 •

edited

Loading