Skip to content

Refactor Provider API to inject a storage directory #2139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rhuss opened this issue May 12, 2025 · 0 comments
Open

Refactor Provider API to inject a storage directory #2139

rhuss opened this issue May 12, 2025 · 0 comments

Comments

@rhuss
Copy link
Contributor

rhuss commented May 12, 2025

Our current provider configurations in templates use absolute paths (e.g., ~/.llama/distributions/...) for storage locations, creating a tight coupling between configuration and deployment environment. This makes e.g. containerized deployments, particularly in Kubernetes, unnecessarily complex as we need to map those paths to something else as a directory within the home directory (which is also not super well-defined when running in containers and depends how you've built your distribution container)

E.g currently in templates we have

providers:
  vector_io:
  - provider_id: sqlite-vec
    provider_type: inline::sqlite-vec
    config:
      db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/verification}/sqlite_vec.db

Proposed Change:
We should modify the Provider API to accept a storage_dir during instantiation, allowing providers to resolve relative paths from their configuration. This would transform the above into:

providers:
  vector_io:
  - provider_id: sqlite-vec
    provider_type: inline::sqlite-vec
    config:
      db_path: sqlite_vec.db  # Relative path

This change would significantly improve our deployment flexibility. In Kubernetes, for example, we could simply mount a persistent volume to the distribution storage directory without complex path mapping. But also in other environments like Windows, where the notion of a Home directory is not so straight forward, or when people want to leverage an external storage for the inline provider data, this might be helpful. One could add a cli option and/or environment variable to point to this storage directory (much like today's SQLITE_STORE_DIR, but no need to add those directly to the templates, which makes it hard to maintain, too, as the pathes also encapsulate distribution names)

It would also make our configurations more portable across environments and easier to version control, as they'd no longer contain environment-specific absolute paths.

The implementation would need to maintain backward compatibility while providing clear documentation and validation for the new relative path pattern. We should also consider how this affects existing deployments and what migration strategy would be appropriate. Backwards compatibility could work by just checking whether the configured path is a relative (new way) or an absolute path.

The provider change could look like

# Current Provider API
class Provider:
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.db_path = config.get('db_path')  # Absolute path

# Proposed Provider API
class Provider:
    def __init__(self, config: Dict[str, Any], storage_dir: Path):
        self.config = config
        self.storage_dir = storage_dir
        self.db_path = self._resolve_path(config.get('db_path'))

    def _resolve_path(self, path: Union[str, Path]) -> Path:
        path = Path(path)
        if path.is_absolute():
            return path
        return self.storage_dir / path

Would appreciate thoughts on this approach and if this makes sense.

Originally posted by @rhuss in #2056

@rhuss rhuss changed the title Our current provider configurations in templates use absolute paths (e.g., ~/.llama/distributions/...) for storage locations, creating a tight coupling between configuration and deployment environment. This makes e.g. containerized deployments, particularly in Kubernetes, unnecessarily complex as we need to map those paths to something else as a directory within the home directory (which is also not super well-defined when running in containers and depends how you've built your distribution container) Refactor Provider API to inject a storage directory May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant