-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-5018: DRA: AdminAccess for ResourceClaims and ResourceClaimTemplates #5019
base: master
Are you sure you want to change the base?
Conversation
ritazh
commented
Jan 3, 2025
- One-line PR description: DRAAdminAccess: allow creations of ResourceClaims and ResourceClaimTemplates in privileged mode to grant access to devices that are in use by other users for admin tasks like monitor health or status of the device.
- Issue link: DRA: AdminAccess for ResourceClaims and ResourceClaimTemplates #5018
- Other comments:
// | ||
// +required | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we update any stale API definitions in a separate PR so there isn't a suggesting that KEP-5018 is responsible for the change from required to optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate commit for this change is good enough for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once #5087 is merged, this diff will go away.
Signed-off-by: Rita Zhang <[email protected]>
910fe6f
to
677a7ed
Compare
/assign @pohly |
/sig auth |
/sig node |
|
||
1. Grants privileged access to the requested device: | ||
|
||
For requests with `adminAccess: true`, the DRA controller bypasses standard allocation checks and allows administrators to access devices already in use. This ensures privileged tasks like monitoring or diagnostics can be performed without disrupting existing allocations. The controller also logs and audits admin-access requests for security and traceability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an assumption that monitoring/diagnostics processes are already running on the underlying host OS, and so any CPU/memory allocation can be guaranteed? And/or do we assume that any new operational headroom required by these privileged tasks is already pre-accounted for, no chance for container allocation to take up 100% of the available headroom of a node's host OS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field does not influence CPU or memory consumption. Those monitoring pods must declare their resources and they will get set aside for them when the pods get scheduled.
|
||
|
||
## Motivation | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important to note that regular users are expected to have write access to resourceclaims / resourceclaimtemplates, which is the main reason we care about protecting access to a privileged field inside those objects.
It looks like those permissions haven't been added to the normal namespaced user edit
role yet ... @pohly, is that expected to happen as part of the main DRA feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't know how "normal" permissions are "normally" (sorry, couldn't resist) defined. I assumed that this would be handled by admins or cloud providers.
But if we have sane upstream defaults, then yes, those should have been modified as part of core DRA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will follow up on this in a separate issue.
``` | ||
1. Authorization Check: | ||
|
||
In the REST storage layer, validate requests to create `ResourceClaim` or `ResourceClaimTemplate` objects with `adminAccess: true`. Only authorize if namespace has the `kubernetes.io/dra-admin-access` label. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
control over a specific field based on some authorization or policy configuration seems more like admission than a REST-layer operation... almost all the examples I can think of like this are implemented as admission plugins:
- PodSecurity
- ResourceQuota
- CertificateSigning
- CertificateApproval
- OwnerReferencesPermissionEnforcement
- CertificateSubjectRestriction
- ClusterTrustBundleAttest
I think this actually layers in more cleanly as an admission plugin, default-on, can be disabled by cluster admin (surface area of a namespace label is unchanged)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ritazh, you had mentioned earlier that there is prior art for doing this at the REST layer when I asked the same question for your prototype. Where is that prior art?
If we keep doing it as proposed now, that probably belongs into the KEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is one example https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/rbac/escalation_check.go#L100-L101
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my recommendation to use the REST storage layer and not admission, because this isn't something that I think a cluster admin should be able to disable. The cluster admin can make it so that anyone can apply labels to namespaces, but they can't prevent the check from running. In the same way that the RBAC escalation checks cannot be disabled, but the cluster admin is free to grant authorization to all users to meet those checks.
The authz logic for AllowUnsafeMalformedObjectDeletion
could have been implemented in admission, but it instead lives in the REST handler where it cannot be disabled.
Perhaps @deads2k has opinions.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ritazh The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some "obsolete" discussion threads still apply.
Signed-off-by: Rita Zhang <[email protected]>
42b6ee2
to
59bbe9c
Compare
Signed-off-by: Rita Zhang <[email protected]>
kep-number: 5018 | ||
authors: | ||
- "@ritazh" | ||
owning-sig: sig-node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: move this whole thing under /sig-auth after review is done and before merging per January 21st wg-device-management community call.
Signed-off-by: Rita Zhang <[email protected]>