-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagination from cache KEP #5017
base: master
Are you sure you want to change the base?
Conversation
As still some pagination requests will be delegated to etcd, we will monitor the | ||
success rate by measuring the pagination cache hit vs miss ratio. | ||
|
||
Consideration: Should we start respecting the limit parameter? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand - are we not respecting the limit parameter in the current iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently API server doesn't respect limit when serving RV="0". https://github.com/kubernetes/kubernetes/blob/6746df77f2376c6bc1fd0de767d2a94e6bd6cec1/staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go#L806-L818
I think we should consider re-enabling limit for consistency, however we need to better understand consequences. Impact on client/server when we cannot serve pagination from cache, like with L7 LB or pagination taking more than 75s.
I'm not worried about clients, user setting limit should be already prepared to handle pagination as it is required when not setting RV.
|
||
For setups with L4 loadbalancer apiserver can be configured with Goaway, which | ||
requests client reconnects periodically, however per request probability should | ||
be configured around 0.1%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for 0.1% here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.1% is the recommended to configuration for GOAWAY kubernetes/kubernetes#88567
requests client reconnects periodically, however per request probability should | ||
be configured around 0.1%. | ||
|
||
For L7 loadbalancer the default algorithm usually is round-robin. For most LBs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to better understand this, is the worst case here as follows (assume 3 API Servers A, B, C):
- Client hits API Server A: assuming the rv is cached, snapshot is created on receiving a LIST request with a limit parameter set
- Client hits API Server B: no snapshot present, we delegate to etcd
- Client hits API Server C: no snapshot present, we delegate to etcd
so the performance degenerates to the current situation without cached pagination with slight improvement for (1)? And if (1) also delegates to etcd in case the rv isn't cached, then perf degenerates to the current scenario of no cached pagination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ^ is assuming all are on a version that has support for pagination from the cache. If there is one server that is on a minor version which does not have support, my understanding is that that would again be delegated to etcd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there is no regression, assuming that:
- We will delegate continue requests, that we don't have cached responses for, to etcd.
- We will not change how API server doesn't respect limit for RV="0".
Signed-off-by: Madhav Jivrajani <[email protected]>
kep-4988: flesh out cached pagination procedure
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: serathius The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Create first draft of #4988 as provisional.
/cc @wojtek-t @deads2k @MadhavJivrajani @jpbetz