-
Notifications
You must be signed in to change notification settings - Fork 229
Spec: Add SigV4 Auth Support for Catalog Federation #1506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
spec/polaris-management-service.yml
Outdated
properties: | ||
roleArn: | ||
type: string | ||
description: The aws IAM role arn assume when signing requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assume the use of STS?
Mere SigV4 can be done with just a keyId/keySecret pair.. why is it not an option? AFAIK, there are S3 implementations that do not have STS, but support SigV4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this assumes the use of STS. While SigV4 can technically work with just a keyID/keySecret pair, that's not how it works in Polaris.
Let me break it down a bit:
Polaris acts as the service provider, and it has its own IAM user with an AWS credential (key ID and key secret). However, this IAM user is owned by the Polaris service, not by the Polaris user. Since IAM doesn't allow one AWS account to grant privileges directly to another IAM user belong to another AWS account, Polaris wouldn't be able to access the polaris user's Glue catalog that way.
So how does Polaris access a user's Glue catalog? By assuming the IAM role provided by the Polaris user. This lets the Polaris service temporarily inherit the permissions tied to that role, essentially gaining the necessary access without the user having to expose long-lived credentials.
Now, why are there S3 implementations that don't use STS?
That's very common on the query engine side. In that case, the query engine is fully managed by the Polaris user themselves. They can create an IAM user + access key pair, grant that IAM user privileges to their storage, and configure their engine to use that credential directly. No need for STS in that scenario.
So here's the key difference:
- AWS User credentials (key ID/secret) are long-lived and tied to a user. IAM roles aren't real credentials, they're just a set of permissions. To access resources, Polaris assumes a role and obtains short-lived temporary credentials via STS. This is much more secure
- In the query engine use case, both the engine and the storage are owned by the same identity (the Polaris user), so they're free to use long-lived user credentials if they want. Bur for polaris service provider and polaris users, they come from two different groups.
The IAM role assumption model is actually consistent with how Polaris accesses S3 storage in general. In Polaris S3 storage configs, users provide an IAM role, Polaris assumes that role, and gets subscoped temporary credentials via STS to access the storage.
Glue Catalog also recommends to use IAM role to manage the access.
https://docs.aws.amazon.com/glue/latest/dg/configure-iam-for-glue.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info. Re: Polaris acts as the service provider ...
while I agree that it's a possible use case, I do not think Apache Polaris needs to be restricted to just that use case. I can imagine users who'd want to deploy and use Polaris in a controlled environment and have ownership both of the AWS storage and Polaris servers.
That, of course, does not invalidate the security best practices that you referenced.
Given that the new connection config is apparently meant to enable connections to AWS services and not to generic S3 implementations, I'd propose to rename the config type to something like AWSSTSAuthenticationParameters
to avoid ambiguity (with type code AWSSTS
). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it consistent with the Iceberg SDK: AuthProperties.java#30
Also, getting the tmp AWS credential from STS is just the first step. The actual operation involves signing the request using the SigV4 protocol. Given that, I'd prefer we stick with using SIGV4 as the type name, it better reflects the authentication mechanism being used.
But I agree that Apache Polaris might be deployed in a controlled environment where users own both the AWS storage and the Polaris servers. In that setup, using an IAM role isn't strictly necessary. Long-lived credentials are often read from environment variables or command-line configs.
That said, we shouldn't recommend storing these long-lived credentials directly in the catalog entity and persisting them in the metastore. It’s a security risk, and it's generally best practice to keep credentials out of persistent metadata wherever possible. (Maybe we can leverage the UserSecretsManager
added by @dennishuo and use a UserSecretReference
to refer to that long-lived credential).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify my point on naming: indeed, the actual authentication mechanism on the wire is going to be SigV4, however, the user-facing configuration and requirements involve STS and role assumption. This is more specific than SigV4. In other words, users who configure this authentication method have to be aware of STS and roles. The basic inputs into SigV4 are ID/secret/session-token, but this auth config is higher level.
Therefore, from my POV using a name that highlights reliance on higher-level concepts like STS would be beneficial and provide clarity to users.
I do not feel too strongly about this, though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or it's better to have a subtype for SigV4? e.g. STS
or STATIC_CREDS
? WDYT?
This will cover both scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied on the dev list as well, but I think for STATIC_CREDS we can probably follow the same convention we started with SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION for StorageConfig, where we can be more opinionated about subscoping design for the "managed multi-tenant" use cases, while allowing a simpler approach for open-ended "self-managed, single-tenant" use cases to just delegate to environment variables and/or default client SDK behaviors.
In particular, we can take stand on whether we ever actually want Polaris to run in a mode that mixes some catalogs that do subscoping with other catalogs that just use static credentials through environment variables or similar. The problem is that the subscoping is trying to promise certain guarantees of isolation, and mixing the two might fundamentally violate its assumptions. So it might be best to really bifurcate the two use cases by essentially making the behavior choice a top-level server config (e.g. SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION) instead of trying to express it as an "option" in the ConnectionConfigType.
That said, I don't feel too strongly whether to make the auth type more sts-specific since in theory there could still be other non-static-creds based flows for sigv4 that also don't use STS. Like it could be SIGV4_STS
or STS_SIGV4
. But for now still not adding any particular type for static creds.
Alternatively, if anyone feels strongly that even in the static creds case we need an explicit ConnectionType for it (maybe in addition to a server-level config), we at the very least need to add a SUPPORTED_CONNECTION_AUTH_TYPES
just like how we have a SUPPORTED_CATALOG_STORAGE_TYPES
because in multi-tenant configurations it'll be important to block the ability for catalogs to be configured to use open-ended static creds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Dennis, thanks for bringing this up, I didn't realize there was already a solution in place for storage config.
We can definitely follow the same pattern for connection config: skip assuming the role and just read the credentials from the command line. I also looked into the Iceberg SDK, if we don't explicitly pass credentials in the catalog properties when initializing the RESTCatalog, the SDK falls back to AWS SDK's DefaultCredentialsProvider
, which picks up credentials from the environment variables.
Code pointers:
RESTSigV4Signer.java#L91
AwsProperties.java#L417-L435
Agree that there could be other non-static-creds (e.g. For Top Secret Region, we need to call CAP service to get the credentials, it's a replacement to the STS). Instead of creating many top-level types for SigV4, let's not adding any particular type for SigV4 auth.
I will add SUPPORTED_CONNECTION_AUTH_TYPES
in a follow up PR.
Mind giving this PR a stamp?
cc: @dimas-b @dennishuo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same comments as @dimas-b and @singhpk234 - but this should be good for now!
Hi folks, any other concerns about this spec change? There is one remaining thing we haven't made a decision: 1. From env vars or server config, e.g.:
2. Configured via the Polaris Management API: Stick to
|
@XJDKC - sorry, I know I had previously approved this proposal but reading through some of the comments, I'm questioning something: Why do we require a IAM User for Polaris? My understanding is that today, Polaris already has access to some IAM Role credentials (through the Default Credential Provider) - why do we need to introduce the new complexity of an IAM User? If we assume that Polaris has access to the IAM Role, then we no longer need the workaround for cross-account access as well. Can you clarify on this bit? On non-STS SigV4 authN, I don't really have a strong preference - and I don't think adopting any of the options constitutes this PR as a one-way door. So I'm okay to punt on it for now. |
Hey @adnanhemani, there is an on-going discussion about this PR: https://lists.apache.org/thread/rlbxvw0xmzvlfm7pdh97bs3xvq7o8lmy This thread also contains some details about it: #1506 (comment) In short, to get an IAM Role credentials, polaris needs to use a long-lived credential (represents the polaris service identity) to assume the role (provided by polaris users) and get a temp credential. This is the solution provided by AWS IAM to grant permissions to AWS account B (polaris service provider) from AWS account A (polaris users). |
I believe the STS credentials concern is not specific to this PR. We can probably deal with that in another PR. |
spec/polaris-management-service.yml
Outdated
userArn: | ||
type: string | ||
description: The aws user arn used to assume the aws role, this represents the polaris service itself | ||
example: "arn:aws:iam::123456789001:user/polaris-service-user" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe userArn
is not part of the Assume Role API call in STS... unless I'm missing something. So this parameter seems redundant in the connection config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property is used to expose the Polaris service's userArn
to Polaris users, so they can update the trust relationship of their IAM role to allow Polaris to assume the role they’ve provided.
AWS doc for editing trust relationships: Editing the trust relationship for an IAM role
From the user's perspective, the flow looks like this:
- Create a Polaris Catalog entity that includes a connection config specifying their IAM roles.
- Polaris injects its IAM user ARN into the connection config.
- Users call the getCatalog API to retrieve catalog details, including the Polaris service identity (userArn).
- Users update the trust relationship on their IAM roles, allowing Polaris to assume those roles.
- For storage config: the IAM role needs permissions to access the bucket.
- For connection config: the IAM role needs permissions to access their catalog (e.g., Glue).
- Polaris assumes the IAM role and gets temporary credentials with the role's permissions.
- Polaris uses those temporary credentials to sign and send requests to the Glue Catalog (or other catalog service).
Storage config also has this property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, @XJDKC : Is it not the user who invokes the Admin API to save this config? The caller of the API must already know the value, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage config also has this property.
I believe this property is equally misleading in the storage config too 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the service owner needs to inform users about userArn
, I believe we need to find another mechanism. Adding properties to configuration objects that are meant to affect Polaris behaviour, but have parameters not used by Polaris code is not a sound approach, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that flow is indeed how the StorageConfigInfo
works today for managed Polaris services that bind their own service-owned IAM User at the time of the CreateCatalog
call; in general the binding of the IAM User to a StorageConfig (or a ConnectionConfig) may not be known until the CreateCatalog
actually runs.
I think the problems with trying to do it at step 1 are:
- The binding might be dynamic and transactionally done performed during the CreateCatalog
- There's no separate API to express pre-planning for intending to create a Federated Catalog
It's true that if the service provider has very static base IAM Users or if the user interaction is very manual (e.g. if it's a small-scale/informal use case where an internal "user" DMs the "service provider" on Slack), then the userArn can be known already before creating the catalog.
Looks like there was some discussion about this already in #1191
I know @XJDKC has proposed having a stronger delineation between user-provided and service-provided properties, and I agree it's worth cleaning up.
The "not used by Polaris" might just be a shortcoming of the current default implementations -- in a way we should be enforcing consistency of the userArn when setting up the stsClient
in
polaris/service/common/src/main/java/org/apache/polaris/service/storage/StorageConfiguration.java
Line 63 in 40aea3a
default Supplier<StsClient> stsClientSupplier() { |
default Supplier<StsClient> stsClientSupplier() {
return Suppliers.memoize(
() -> {
StsClientBuilder stsClientBuilder = StsClient.builder();
if (awsAccessKey().isPresent() && awsSecretKey().isPresent()) {
LoggerFactory.getLogger(StorageConfiguration.class)
.warn("Using hard-coded AWS credentials - this is not recommended for production");
StaticCredentialsProvider awsCredentialsProvider =
StaticCredentialsProvider.create(
AwsBasicCredentials.create(awsAccessKey().get(), awsSecretKey().get()));
stsClientBuilder.credentialsProvider(awsCredentialsProvider);
}
return stsClientBuilder.build();
});
}
But the default implementation is just kept simple by basically taking a single root IAM User from the runtime environment that ends up being the userArn used implicitly for all StorageConfigurations in the service.
One step to expand on that architecture in a way that better illustrates how the fields actually should be used by the code would be if we allow something like setting per-realm root IAM Users or IAM Roles in the Quarkus config for being the "service identity" that will assumeRole
on the user-specified catalog-level roleArn. Then the stsClient Supplier would be initially authenticating as the realm-specific "service identity" that matches what's in the Catalog's properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the concept of "read-only" or "write-only" fields are actually first-class in OpenAPI: https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.2.md#fixed-fields-20
readOnly:
Relevant only for Schema "properties" definitions. Declares the property as "read only". This means that it MAY be sent as part of a response but SHOULD NOT be sent as part of the request. If the property is marked as readOnly being true and is in the required list, the required will take effect on the response only. A property MUST NOT be marked as both readOnly and writeOnly being true. Default value is false.
writeOnly:
Relevant only for Schema "properties" definitions. Declares the property as "write only". Therefore, it MAY be sent as part of a request but SHOULD NOT be sent as part of the response. If the property is marked as writeOnly being true and is in the required list, the required will take effect on the request only. A property MUST NOT be marked as both readOnly and writeOnly being true. Default value is false.
Maybe we should experiment with some of those additional annotations? Interestingly, writeOnly
conveys well other patterns we've had such as inline secrets.
And readOnly
would be fields used for communicating things decided by the server backend back to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, @dennishuo !
It looks like this is not so much about using a two-stage Assume Role approach in Polaris, but more about the source and target audience of userArn
. Right now it is clear that the user is not the source but a consumer of this value.
I believe it would clearer if this was represented in another config object, for example ConnectionConfigInfo.accessDeclarations
to contain userArn
.
Whether or not Polaris does a two-stage AssumeRole is not directly relevant to the AuthenticationParameters
. These parameters define what Polaris needs to do in order to gain access to user's resources. userArn
is a declaration Polaris makes about the identity it will use for accessing the external resources. Whether Polaris is authorized to use userArn
is beyond the scope of this config. It is declared to the user by Polaris in order for the user to make appropriate grants.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Dmitri and Dennis, thanks for the suggestions! Extracting the service-provided properties into a nested field and annotating it with readOnly
both make a lot of sense to me. I'll update the spec to reflect these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM given the consistency with StorageConfig, though I agree we should probably refine the way we convey "output-only" concepts like the userArn
, and get agreement with other reviewers on how best to convey that distinction before merging this PR (maybe simply annotating with the openApi field readOnly
is enough, or we could go further and move the readOnly fields into a separate tuple such as vendorInfo
like you suggested elsewhere).
spec/polaris-management-service.yml
Outdated
userArn: | ||
type: string | ||
description: The aws user arn used to assume the aws role, this represents the polaris service itself | ||
example: "arn:aws:iam::123456789001:user/polaris-service-user" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the concept of "read-only" or "write-only" fields are actually first-class in OpenAPI: https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.2.md#fixed-fields-20
readOnly:
Relevant only for Schema "properties" definitions. Declares the property as "read only". This means that it MAY be sent as part of a response but SHOULD NOT be sent as part of the request. If the property is marked as readOnly being true and is in the required list, the required will take effect on the response only. A property MUST NOT be marked as both readOnly and writeOnly being true. Default value is false.
writeOnly:
Relevant only for Schema "properties" definitions. Declares the property as "write only". Therefore, it MAY be sent as part of a request but SHOULD NOT be sent as part of the response. If the property is marked as writeOnly being true and is in the required list, the required will take effect on the request only. A property MUST NOT be marked as both readOnly and writeOnly being true. Default value is false.
Maybe we should experiment with some of those additional annotations? Interestingly, writeOnly
conveys well other patterns we've had such as inline secrets.
And readOnly
would be fields used for communicating things decided by the server backend back to the user.
Hi folks, I've updated the spec to extract the Polaris service identity info into a nested object. Right now, I've added support for two types of service identities:
The service identity info is part of the SigV4AuthenticationParameters, it's read only and will be surfaced to polaris users in the response of Looking ahead, we may introduce additional service identity types (e.g. GCP Service Account) and aim to unify the behavior between storage config and connection config for consistency. I have drafted a proposal for the credential management in Apache Polaris, would love to hear your thoughts, feedback, and suggestions. cc: @dennishuo @dimas-b |
Context
Prior Catalog Federation work:
Reference
Below are all the reference mentioned in this PR:
polaris/polaris-core/src/main/java/org/apache/polaris/core/storage/aws/AwsCredentialsStorageIntegration.java
Line 68 in ca35bba
Description
This PR adds the SigV4 Auth Support so that Polaris can federate to Glue IRC / S3Tables / catalog server hosted behind the AWS API Gateway.
Details
User-provided properties:
roleArn
: the AWS IAM role urn, polaris will assume the role to get a tmp aws session credentialroleSessionName
: The role session name to be used by the SigV4 protocol for signing requestsexternalId
: An optional external id used to establish a trust relationship with AWS in the trust policysigningRegion
: Region to be used by the SigV4 protocol for signing requestssigningName
: The service name to be used by the SigV4 protocol for signing requests, the default signing name is "execute-api" is if not providedglue
s3tables
execute-api
Service-provided properties:
userArn
: The aws user arn used to assume the aws role, this represents the polaris service itself, overall steps:rest.signing-region
rest.signing-name
Reference