You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide shows you how to handle common errors that happen when crawling websites.
14
+
This guide demonstrates techniques for handling common errors encountered during web crawling operations.
15
15
16
16
## Handling proxy errors
17
17
18
-
Low-quality proxies can cause problems even with high settings for `max_request_retries` and `max_session_rotations` in <ApiLinkto="class/BasicCrawlerOptions">BasicCrawlerOptions</ApiLink>. If you can't get data because of proxy errors, you might want to try again. You can do this using <ApiLinkto="class/BasicCrawler#failed_request_handler">failed_request_handler</ApiLink>:
18
+
Low-quality proxies can cause problems even with high settings for `max_request_retries` and `max_session_rotations` in <ApiLinkto="class/BasicCrawlerOptions">`BasicCrawlerOptions`</ApiLink>. If you can't get data because of proxy errors, you might want to try again. You can do this using <ApiLinkto="class/BasicCrawler#failed_request_handler">`failed_request_handler`</ApiLink>:
@@ -25,7 +25,7 @@ You can use this same approach when testing different proxy providers. To better
25
25
26
26
## Changing how error status codes are handled
27
27
28
-
By default, when <ApiLinkto="class/Session">Sessions</ApiLink> get status codes like [401](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/401), [403](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/403), or [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/429), Crawlee marks the <ApiLinkto="class/Session">Session</ApiLink> as `retire` and switches to a new one. This might not be what you want, especially when working with [authentication](./logging-in-with-a-crawler). You can learn more in the [session management guide](./session-management).
28
+
By default, when <ApiLinkto="class/Session">`Sessions`</ApiLink> get status codes like [401](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/401), [403](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/403), or [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/429), Crawlee marks the <ApiLinkto="class/Session">`Session`</ApiLink> as `retire` and switches to a new one. This might not be what you want, especially when working with [authentication](./logging-in-with-a-crawler). You can learn more in the [Session management guide](./session-management).
29
29
30
30
Here's an example of how to change this behavior:
31
31
@@ -37,7 +37,7 @@ Here's an example of how to change this behavior:
37
37
38
38
Sometimes you might get unexpected errors when parsing data, like when a website has an unusual structure. Crawlee normally tries again based on your `max_request_retries` setting, but sometimes you don't want that.
39
39
40
-
Here's how to turn off retries for non-network errors using <ApiLinkto="class/BasicCrawler#error_handler">error_handler</ApiLink>, which runs before `Crawlee` tries again:
40
+
Here's how to turn off retries for non-network errors using <ApiLinkto="class/BasicCrawler#error_handler">`error_handler`</ApiLink>, which runs before Crawlee tries again:
0 commit comments