Skip to content

Refactor storage access configuration handling #1504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 5, 2025

Conversation

dimas-b
Copy link
Contributor

@dimas-b dimas-b commented May 1, 2025

This is a step towards supporting non-AWS S3 storage, but this refactoring is relevant to all storage backends.

There is no change to existing behaviours.

  • Rename PolarisCredentialProperty to IcebergStorageAccessProperty and introduce non-credential properties (as an example for now)

  • IcebergStorageAccessProperty values are ultimately meant to be produced by PolarisStorageIntegration implementations

  • Some previous entries in IcebergStorageAccessProperty are not really credential properties, but their treatment is not changed in this PR to maintain exactly the same bahaviour as before.

  • Add AccessConfig to represent both credential and non-credential properties related to storage access.

Copy link
Contributor

@collado-mike collado-mike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a breaking change without a clear reason. What's the purpose in separating the credentials from extraProperties?

Comment on lines +26 to +28
Map<String, String> credentials();

Map<String, String> extraProperties();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the need for breaking this out into separate fields? They all end up in the same map for every single caller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main idea is to isolate credential and non-credential properties in REST API responses. Specifically in IcebergCatalogHandler.

This PR does not cause any functional change, but prepares for exposing properties like s3.endpoint to clients later.

These properties are not always mixed together. PR #1225 brings some distinction in load table responses.

@dimas-b
Copy link
Contributor Author

dimas-b commented May 1, 2025

Seems like a breaking change without a clear reason

@collado-mike : Could you clarify what is broken by this PR? I believe there's no behaviour change.

@dimas-b
Copy link
Contributor Author

dimas-b commented May 1, 2025

What's the purpose in separating the credentials from extraProperties?

The purpose is to allow not mixing credential and non-credential properties in loadTableResponse per #1225

Other renames in this PR are meant to clarity code-level distinction between credentials and non-credential properties related to storage access.

@collado-mike
Copy link
Contributor

@collado-mike : Could you clarify what is broken by this PR? I believe there's no behaviour change.

Sorry, I misread the cache code without closely reading the StorageCredentialCacheEntry changes. You're right, there are no behavior changes here.

I understand the reasoning for the change now and I agree.

collado-mike
collado-mike previously approved these changes May 1, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board May 1, 2025
@dimas-b
Copy link
Contributor Author

dimas-b commented May 1, 2025

@eric-maynard @dennishuo : Please have a look too :)

Copy link
Collaborator

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like this refactor, it makes lots of sense to do this. One small, stylistic comment.

One minor question, why the name IcebergStorageAccessProperty? I understand that all tables today are Iceberg, but in the future if they were not, would this class still only be relevant to them only?

import org.apache.polaris.core.storage.PolarisStorageConfigurationInfo;
import org.apache.polaris.core.storage.PolarisStorageIntegration;
import org.apache.polaris.core.storage.StorageLocation;
import org.apache.polaris.core.storage.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should revert this to ensure no wildcard imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - fixed

@github-project-automation github-project-automation bot moved this from Ready to merge to PRs In Progress in Basic Kanban Board May 2, 2025
Comment on lines 35 to 37
AWS_ENDPOINT(String.class, "s3.endpoint", "the S3 endpoint to use for requests", false),
AWS_PATH_STYLE_ACCESS(
Boolean.class, "s3.path-style-access", "whether to use S3 path style access", false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@eric-maynard
Copy link
Contributor

The semantic changes look good to me, but there's also a rename happening here that looks a bit unrelated to those changes. I'd like to understand the intent behind the rename a little more if possible.

This is a step towards supporting non-AWS S3 storage, but this
refactoring is relevant to all storage backends.

There is no change to existing behaviours.

* Rename PolarisCredentialProperty to IcebergStorageAccessProperty
  and introduce non-credential properties (as an example for now)

* IcebergStorageAccessProperty values are ultimately meant to be
  produced by PolarisStorageIntegration implementations

* Some previous entries in IcebergStorageAccessProperty are not really
  credential properties, but their treatment is not changed in this
  PR to maintain exactly the same bahaviour as before.

* Add AccessConfig to represent both credential and non-credential
  properties related to storage access.
* <p>Most of these properties are meant to configure Iceberg FileIO objects for accessing data in
* storage.
*/
public enum StorageAccessProperty {
AWS_KEY_ID(String.class, "s3.access-key-id", "the aws access key id"),
AWS_SECRET_KEY(String.class, "s3.secret-access-key", "the aws access key secret"),
AWS_TOKEN(String.class, "s3.session-token", "the aws scoped access token"),
AWS_SESSION_TOKEN_EXPIRES_AT_MS(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this considered "credential"?

Copy link
Collaborator

@adnanhemani adnanhemani May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Key ID, Secret Key, Session Token, and expiry time is definitely together considered a "credential" for AWS. Not sure if there's a reason why you're asking this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it is part of the credential bunch of properties.

Copy link
Contributor

@eric-maynard eric-maynard May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS_SESSION_TOKEN_EXPIRES_AT_MS by itself is definitely not a credential. But it is a property related to credential vending and access to object storage, and so I think having it here makes sense.

Copy link
Collaborator

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Excited to see S3-compatible storage support coming together. Thanks for working on this.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board May 3, 2025
@dimas-b dimas-b merged commit 4db7998 into apache:main May 5, 2025
6 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board May 5, 2025
@dimas-b dimas-b deleted the refactor-access-config branch May 5, 2025 17:08
@ryanovas
Copy link

ryanovas commented May 5, 2025

With the other issue arrive S3-compatible storage closed, and this step towards it merged, where can we keep up with the overall progress of S3-compatible storage now?

@dimas-b
Copy link
Contributor Author

dimas-b commented May 6, 2025

Thanks for the reminder @ryanovas . I've just filed a fresh issue for this : #1530 (since #32 has large scope)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants