Skip to content

Add support for non-default unnamed storages #1175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vdusek opened this issue Apr 25, 2025 · 1 comment
Open

Add support for non-default unnamed storages #1175

vdusek opened this issue Apr 25, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@vdusek
Copy link
Collaborator

vdusek commented Apr 25, 2025

Problem

The Apify platform supports non-default unnamed storages. This functionality is also available in the Apify Python client, where you can do the following (example for dataset):

await DatasetCollectionClientAsync.get_or_create()

Each call creates a new, unnamed dataset with a unique ID.

In contrast, Crawlee does not support this (in any storage client). For example, repeated calls to:

await Dataset.open()

always return the same default unnamed storage.

Goal state

Achieve feature parity between Crawlee storages (all storage clients, including the ApifyStorageClient) and the Apify platform (API client) by adding support for non-default unnamed storages.

Possible solution

Introduce a new argument to the storage open constructor:

async def open(
    cls,
    name: str | None = None,
    id: str | None = None,
    scope: Literal['run', 'global'] = 'global',
) -> Dataset | KeyValueStore | RequestQueue:
    ...
  • scope='run' indicates a non-default unnamed storage.
  • scope='global' refers to globally named storages.
  • The name parameter cannot be entirely removed for run scope storages, as it's needed:
    • For the filesystem storage: to use as a directory name.
    • For Apify platform storage: to store the mapping of name -> ID in the default key-value store.

Behavior matrix...

Open storage by ID and name

  • Raise an exception.
  • Scope argument is ignored.

Open storage by ID

  • Opens an existing storage by ID.
  • Scope?

Open storage by name

  • Scope run:
    • Opens or creates a run-scope (non-default unnamed) storage.
      • name is used internally for reference-storage purposes but is not the actual storage's "name".
  • Scope global:
    • Opens or creates a global named storage.

Open storage without args

  • Opens the default unnamed storage.
  • Scope argument is ignored.
@vdusek vdusek added enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. labels Apr 25, 2025
@vdusek vdusek self-assigned this Apr 25, 2025
@janbuchar
Copy link
Collaborator

When opening the storage by ID, the scope does not make sense. I think an exception would be appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants