Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-5018: DRA: AdminAccess for ResourceClaims and ResourceClaimTemplates #5019

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ritazh
Copy link
Member

@ritazh ritazh commented Jan 3, 2025

  • One-line PR description: DRAAdminAccess: allow creations of ResourceClaims and ResourceClaimTemplates in privileged mode to grant access to devices that are in use by other users for admin tasks like monitor health or status of the device.
  • Other comments:

NOTE: The DRAAdminAccess feature was initially part of https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters In 1.32, the DRAAdminAccess feature gate was added to keep the adminAccess field in alpha while promoting structured parameters to beta. It was discussed this feature should be part of a separate KEP to push it forward. Most references to DRAAdminAccess are removed from the original KEP and moved to this KEP.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 3, 2025
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 3, 2025
//
// +required
// +optional
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we update any stale API definitions in a separate PR so there isn't a suggesting that KEP-5018 is responsible for the change from required to optional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate commit for this change is good enough for me.

Copy link
Member Author

@ritazh ritazh Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #5087 is merged, this diff will go away.

Signed-off-by: Rita Zhang <[email protected]>
@kubernetes kubernetes deleted a comment from k8s-ci-robot Jan 3, 2025
@ritazh
Copy link
Member Author

ritazh commented Jan 3, 2025

/assign @pohly

@ritazh
Copy link
Member Author

ritazh commented Jan 3, 2025

/sig auth

@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Jan 3, 2025
@ritazh
Copy link
Member Author

ritazh commented Jan 6, 2025

/sig node


1. Grants privileged access to the requested device:

For requests with `adminAccess: true`, the DRA controller bypasses standard allocation checks and allows administrators to access devices already in use. This ensures privileged tasks like monitoring or diagnostics can be performed without disrupting existing allocations. The controller also logs and audits admin-access requests for security and traceability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an assumption that monitoring/diagnostics processes are already running on the underlying host OS, and so any CPU/memory allocation can be guaranteed? And/or do we assume that any new operational headroom required by these privileged tasks is already pre-accounted for, no chance for container allocation to take up 100% of the available headroom of a node's host OS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field does not influence CPU or memory consumption. Those monitoring pods must declare their resources and they will get set aside for them when the pods get scheduled.

keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/4381-dra-structured-parameters/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
@liggitt liggitt added the api-review Categorizes an issue or PR as actively needing an API review. label Jan 16, 2025


## Motivation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to note that regular users are expected to have write access to resourceclaims / resourceclaimtemplates, which is the main reason we care about protecting access to a privileged field inside those objects.

It looks like those permissions haven't been added to the normal namespaced user edit role yet ... @pohly, is that expected to happen as part of the main DRA feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know how "normal" permissions are "normally" (sorry, couldn't resist) defined. I assumed that this would be handled by admins or cloud providers.

But if we have sane upstream defaults, then yes, those should have been modified as part of core DRA.

Copy link
Member Author

@ritazh ritazh Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will follow up on this in a separate issue.

```
1. Authorization Check:

In the REST storage layer, validate requests to create `ResourceClaim` or `ResourceClaimTemplate` objects with `adminAccess: true`. Only authorize if namespace has the `kubernetes.io/dra-admin-access` label.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

control over a specific field based on some authorization or policy configuration seems more like admission than a REST-layer operation... almost all the examples I can think of like this are implemented as admission plugins:

  • PodSecurity
  • ResourceQuota
  • CertificateSigning
  • CertificateApproval
  • OwnerReferencesPermissionEnforcement
  • CertificateSubjectRestriction
  • ClusterTrustBundleAttest

I think this actually layers in more cleanly as an admission plugin, default-on, can be disabled by cluster admin (surface area of a namespace label is unchanged)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ritazh, you had mentioned earlier that there is prior art for doing this at the REST layer when I asked the same question for your prototype. Where is that prior art?

If we keep doing it as proposed now, that probably belongs into the KEP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my recommendation to use the REST storage layer and not admission, because this isn't something that I think a cluster admin should be able to disable. The cluster admin can make it so that anyone can apply labels to namespaces, but they can't prevent the check from running. In the same way that the RBAC escalation checks cannot be disabled, but the cluster admin is free to grant authorization to all users to meet those checks.

The authz logic for AllowUnsafeMalformedObjectDeletion could have been implemented in admission, but it instead lives in the REST handler where it cannot be disabled.

Perhaps @deads2k has opinions.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ritazh
Once this PR has been reviewed and has the lgtm label, please assign derekwaynecarr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some "obsolete" discussion threads still apply.

keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
keps/sig-node/5018-dra-adminaccess/README.md Outdated Show resolved Hide resolved
Signed-off-by: Rita Zhang <[email protected]>
Signed-off-by: Rita Zhang <[email protected]>
kep-number: 5018
authors:
- "@ritazh"
owning-sig: sig-node
Copy link
Member Author

@ritazh ritazh Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: move this whole thing under /sig-auth after review is done and before merging per January 21st wg-device-management community call.

Signed-off-by: Rita Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Assigned
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

7 participants