-
Notifications
You must be signed in to change notification settings - Fork 230
Refactor storage access configuration handling #1504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a breaking change without a clear reason. What's the purpose in separating the credentials from extraProperties
?
Map<String, String> credentials(); | ||
|
||
Map<String, String> extraProperties(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the need for breaking this out into separate fields? They all end up in the same map for every single caller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea is to isolate credential and non-credential properties in REST API responses. Specifically in IcebergCatalogHandler.
This PR does not cause any functional change, but prepares for exposing properties like s3.endpoint
to clients later.
These properties are not always mixed together. PR #1225 brings some distinction in load table responses.
@collado-mike : Could you clarify what is broken by this PR? I believe there's no behaviour change. |
The purpose is to allow not mixing credential and non-credential properties in Other renames in this PR are meant to clarity code-level distinction between credentials and non-credential properties related to storage access. |
Sorry, I misread the cache code without closely reading the I understand the reasoning for the change now and I agree. |
@eric-maynard @dennishuo : Please have a look too :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite like this refactor, it makes lots of sense to do this. One small, stylistic comment.
One minor question, why the name IcebergStorageAccessProperty
? I understand that all tables today are Iceberg, but in the future if they were not, would this class still only be relevant to them only?
import org.apache.polaris.core.storage.PolarisStorageConfigurationInfo; | ||
import org.apache.polaris.core.storage.PolarisStorageIntegration; | ||
import org.apache.polaris.core.storage.StorageLocation; | ||
import org.apache.polaris.core.storage.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should revert this to ensure no wildcard imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - fixed
polaris-core/src/main/java/org/apache/polaris/core/storage/IcebergStorageAccessProperty.java
Outdated
Show resolved
Hide resolved
AWS_ENDPOINT(String.class, "s3.endpoint", "the S3 endpoint to use for requests", false), | ||
AWS_PATH_STYLE_ACCESS( | ||
Boolean.class, "s3.path-style-access", "whether to use S3 path style access", false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The semantic changes look good to me, but there's also a rename happening here that looks a bit unrelated to those changes. I'd like to understand the intent behind the rename a little more if possible. |
This is a step towards supporting non-AWS S3 storage, but this refactoring is relevant to all storage backends. There is no change to existing behaviours. * Rename PolarisCredentialProperty to IcebergStorageAccessProperty and introduce non-credential properties (as an example for now) * IcebergStorageAccessProperty values are ultimately meant to be produced by PolarisStorageIntegration implementations * Some previous entries in IcebergStorageAccessProperty are not really credential properties, but their treatment is not changed in this PR to maintain exactly the same bahaviour as before. * Add AccessConfig to represent both credential and non-credential properties related to storage access.
b9755b3
to
fa3e82a
Compare
* <p>Most of these properties are meant to configure Iceberg FileIO objects for accessing data in | ||
* storage. | ||
*/ | ||
public enum StorageAccessProperty { | ||
AWS_KEY_ID(String.class, "s3.access-key-id", "the aws access key id"), | ||
AWS_SECRET_KEY(String.class, "s3.secret-access-key", "the aws access key secret"), | ||
AWS_TOKEN(String.class, "s3.session-token", "the aws scoped access token"), | ||
AWS_SESSION_TOKEN_EXPIRES_AT_MS( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this considered "credential"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Key ID, Secret Key, Session Token, and expiry time is definitely together considered a "credential" for AWS. Not sure if there's a reason why you're asking this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it is part of the credential bunch of properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS_SESSION_TOKEN_EXPIRES_AT_MS
by itself is definitely not a credential. But it is a property related to credential vending and access to object storage, and so I think having it here makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Excited to see S3-compatible storage support coming together. Thanks for working on this.
With the other issue arrive S3-compatible storage closed, and this step towards it merged, where can we keep up with the overall progress of S3-compatible storage now? |
This is a step towards supporting non-AWS S3 storage, but this refactoring is relevant to all storage backends.
There is no change to existing behaviours.
Rename PolarisCredentialProperty to IcebergStorageAccessProperty and introduce non-credential properties (as an example for now)
IcebergStorageAccessProperty values are ultimately meant to be produced by PolarisStorageIntegration implementations
Some previous entries in IcebergStorageAccessProperty are not really credential properties, but their treatment is not changed in this PR to maintain exactly the same bahaviour as before.
Add AccessConfig to represent both credential and non-credential properties related to storage access.