Skip to content

Breaking changes for v0.6 #906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vdusek opened this issue Jan 15, 2025 · 6 comments
Closed

Breaking changes for v0.6 #906

vdusek opened this issue Jan 15, 2025 · 6 comments
Assignees
Labels
debt Code quality improvement or decrease of technical debt. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@vdusek
Copy link
Collaborator

vdusek commented Jan 15, 2025

Remove unused fields in the Configuration

  • Fields to remove: chrome_executable_path, xvfb.
  • Additionally, consider removing the verbose_log field:
    • The verbose_log field is only used to derive the log level, and a log_level option already exists. So I consider it as redundant.

Refactor abstract class naming

  • Remove "Base" from the names of abstract classes (e.g., BaseStorage, BaseStorageClient, ...).
  • Together with @janbuchar we came to a conclusion not to use the Hungarian notation (where the name of a variable indicates its intention or kind). @Pijukatel, feel free to share your thoughts.

Rename enqueue_links for clarity

  • The naming of enqueue_links causes confusion due to its similarity to add_requests.
  • Proposed solution:
    • Rename enqueue_links to something more descriptive, such as extract_links (indicating what it does more precisely).
  • I am open to name suggestions here.

Update of enqueue_links

  • enqueue_links will have the same interface as its JavaScript counterpart, enqueueLinks.
  • add_requests will remain dedicated to adding requests to the RQ only.
  • A new extract_links function can be introduced for link extraction only.
  • Internally, enqueue_links can utilize both add_requests and extract_links.
@vdusek vdusek added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 15, 2025
@Pijukatel
Copy link
Collaborator

Regarding abstract class naming, just one more thing to think about. It will involve both BaseSomeClass and AbstractSomeClass. What will we do when we have default implementation of such class? For example HttpCrawler is one implementation of AbstractHttpCrawler. (Where HttpCrawler was reserved for backwards compatibility)

@vdusek
Copy link
Collaborator Author

vdusek commented Jan 15, 2025

Maybe for AbstractHttpCrawler we would have to make an exception. Or rename the HttpCrawler to something else.

@Mantisus
Copy link
Collaborator

Mantisus commented Feb 5, 2025

I'm thinking of moving from a dict to a CookieJar-based approach for storing cookies in Session for #933, would be appropriate for 0.6

@vdusek vdusek added this to the 108th sprint - Tooling team milestone Feb 10, 2025
@vdusek vdusek added the debt Code quality improvement or decrease of technical debt. label Feb 10, 2025
vdusek added a commit that referenced this issue Feb 12, 2025
@vdusek
Copy link
Collaborator Author

vdusek commented Feb 12, 2025

One more thing from Ruff 0.9:

src/crawlee/statistics/__init__.py:1:1: A005 Module `statistics` shadows a Python standard-library module

Since statistics is a public module, we should consider renaming it for v0.6 release. Maybe just stats? Or do you have any suggestions?

@Pijukatel @janbuchar

@janbuchar
Copy link
Collaborator

One more thing from Ruff 0.9:

src/crawlee/statistics/__init__.py:1:1: A005 Module `statistics` shadows a Python standard-library module

Since statistics is a public module, we should consider renaming it for v0.6 release. Maybe just stats? Or do you have any suggestions?

@Pijukatel @janbuchar

I wouldn't mind disabling the rule in this case. It is unlikely that anyone will do from crawlee import statistics - that would be cumbersome to work with.

@vdusek
Copy link
Collaborator Author

vdusek commented Mar 4, 2025

Closing, as all the breaking changes were resolved. (There are no breaking changes in #1024.)

@vdusek vdusek closed this as completed Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debt Code quality improvement or decrease of technical debt. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

4 participants