-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-127 (UserNS): allow customizing subids length #5020
base: master
Are you sure you want to change the base?
Conversation
The number of subuids and subgids for each of pods is hard-coded to 65536, regardless to the total ID count specified in `/etc/subuid` and `/etc/subgid`: https://github.com/kubernetes/kubernetes/blob/v1.32.0/pkg/kubelet/userns/userns_manager.go#L211-L228 This is not enough for some images. Nested containerization needs a huge number of subids too. Signed-off-by: Akihiro Suda <[email protected]>
The mapping length (multiple of 65536) will be customizable via a new | ||
`KubeletConfiguration` property `subidsPerPod`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd got the impression we might want to make the mapping size configurable on a per-Pod basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(what if you have a particular Pod that assigns a (POSIX) ID to each user, and you have 42000000 users, but all your other Pods only need 65000 UIDs?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible but not a common case IMO, and the implementation of adding a pod API field would be much more complex than adding a kubelet configuration field. I'm not sure the maintenance burden is worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as we're not accidentally tying ourselves into not being able to extend the Pod API in the future. If we are tying ourselves, let's make sure we'd never want the option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about introducing a Pod security context property like securityContext.userNS.staticMappingWithUsername: "foo"
.
This will run getsubids foo
to obtain the subID range, and assign the entire range to the Pod.
(So, this is different from getsubids kubelet
which returns the total range for the 110 pods)
Multiple pods may use the same range at their own risk.
This allows assigning an extremely large subID range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the max idrange inside of a container flexible? as in: could we have a kubelet field that toggles a dynamic range and the runtime interpret the range in the image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flexible. A container may use UID that is not present in /etc/passwd
in the image. So, a runtime cannot "interpret the range in the image".
It should be still possible to have OCI Image annotations to declare the range of the needed UIDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to prevent that such a field is not abused? An image could claim all the available IDs and prevents that other pods can be created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admission-time checks is where I'd start; also ResourceQuota and LimitRange specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prevents that other pods can be created
No, with securityContext.userNS.staticMappingWithUsername: "foo"
which allows ID conflicts and requires the explicit configuration of the securityContext.
This should be still probably prohibited for Restricted
Pod Security Standard.
@AkihiroSuda can you please elaborate on what is needed for the use case? We have several options, none is perfect with the info we have, but with more info on the use case we might be able to make a better decision. For example, one option proposed here is to use bigger ranges for all pods. That might work or not, depending how big the ranges need to be (it can be that we can't run the number of maxPods configured to the node if the ranges are very big). Another option is to use the pod.spec as you suggested, but we need to think about abuses as @giuseppe was mentioning. Can you share more details on the use case, so we can see what might be the best way to tackle it? |
e.g., nesting containers (Docker, Kubernetes, BuildKit, whatever) inside Kubernetes without full privileges |
@AkihiroSuda cool! And what are the number of IDs needed for each?
So, let's see what the needs for each of those are and think on ways to support them. With this partial info, correct me if I'm wrong, I understand that all will work fine if we support a multiple of 65536 as length. That simplifies a lot of things, so I'd like to keep that. I'm not sure if a kubelet config vs pod.spec is the right place to choose this. Not sure about what granularity we want to expose for these pods with "wide mappings". My thinking is:
What do you think? My gut feeling is that the subidsPerPod as kubelet config can get the job done here. It's hard for me to try to see if it will fall short in the future or not, though, so more opinions are very welcome :) |
I still feel kubelet config is sufficient personally |
is there a way to do that (eg: config field named If you've one Pod that wants to use UID 999999999 it's a shame if you also have to give that at least many UIDs to every other Pod. It's painful even if you dedicate a couple of nodes for that component and run the other nodes with 65536. |
@haircommander You are right, I agree. Seeing the use cases and sleeping over it last night, I agree. That should be more than enough for now, and the door is open if down the road we need to add a pod.spec field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@giuseppe what do you think?
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: AkihiroSuda, rata The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm FYI to reviewers: I was also hoping to move this KEP to on by default beta in 1.33 |
The number of subuids and subgids for each of pods is hard-coded to 65536, regardless to the total ID count specified in
/etc/subuid
and/etc/subgid
: https://github.com/kubernetes/kubernetes/blob/v1.32.0/pkg/kubelet/userns/userns_manager.go#L211-L228This is not enough for some images.
Nested containerization needs a huge number of subids too.