Skip to content

Make HPA behavior configurable #431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

djahandarie
Copy link

@djahandarie djahandarie commented Mar 8, 2025

What this PR does / why we need it:

This allows the end-user to specified desired v2 HPA behavior, instead of using the default HPA behavior specified by tortoise. This is needed because currently tortoise's scale up behavior is very aggressive, which results in poor cost performance on workloads which are bursty and don't need that much aggressive scaling to meet their latency/throughput targets.

Which issue(s) this PR fixes:

Partially fixes #341 (this PR makes it possible for the end-user to prevent too much scaling up but does not put a safeguard in Tortoise by default like that issue is written to suggest)

Special notes for your reviewer:

  • I don't like the fact the nil check dance is duplicated, but I found nowhere else in the code where defaults can be set in one spot (Default in tortoise_webhook.go seemed close, but it seems to not be called in all code paths, as the tests failed).
  • Since this new field is optional and at the end of the protobuf, I guess there is no break in the API; is this understanding right?
  • This seems to be the first addition to the Tortoise spec that is simply just fully exposing a k8s API object, is this okay? (I think it is, but just want to raise it as a discussion point)

@sanposhiho
Copy link
Collaborator

#433 will fix the ci

@djahandarie
Copy link
Author

@sanposhiho CI should be passing now, PTAL when you have a moment 🙏

Copy link
Collaborator

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a new test case or edit existing test case in the controller level test to ensure this behaviour?

https://github.com/mercari/tortoise/blob/main/internal/controller/tortoise_controller_test.go

@sanposhiho
Copy link
Collaborator

@AVSBharadwaj @jrbangit @AdityaK011 This one as well - Do you want to take a look, as a reviewing training?
See ↓ for the basic rules that you must check on reviewing:
https://github.com/mercari/tortoise/blob/main/docs/contributor-guide.md

@djahandarie djahandarie requested a review from AdityaK011 as a code owner March 15, 2025 10:23
@djahandarie
Copy link
Author

@sanposhiho Added 🙏

@sanposhiho
Copy link
Collaborator

My free time is occupied for Kubernetes code freeze and Kubecon preparation. I wouldn't be able to take time until all of them are completed - mid of April.
@djahandarie contact @AVSBharadwaj @jrbangit @AdityaK011 internally if you want to make it merged sooner. Either way, I hope they get some experience around reviewing things on tortoise

Copy link
Collaborator

@AVSBharadwaj AVSBharadwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM. Might need to refactor a bit to incorporate the configuration.

@AVSBharadwaj
Copy link
Collaborator

AVSBharadwaj commented Apr 15, 2025

@sanposhiho @randytqwjp @jrbangit I have merged commit from my PR to this Pr itself and also added support for configuring HPABehavior from the config.yaml file.

@AVSBharadwaj AVSBharadwaj requested a review from sanposhiho April 17, 2025 07:00
@AVSBharadwaj
Copy link
Collaborator

will wait for another review/approve from code owners

Copy link
Collaborator

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits only

// Validate HPA behavior if specified
if config.DefaultHPABehavior != nil {
// Validate scale up policies
if config.DefaultHPABehavior.ScaleUp != nil && len(config.DefaultHPABehavior.ScaleUp.Policies) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Looks better to create a separate function validateDefaultHPA, and then call it from here.
Where are these validations coming from, by the way? Copied from the upstream Kubernetes?

Copy link
Collaborator

@AVSBharadwaj AVSBharadwaj May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -72,6 +74,7 @@ func New(
tortoiseHPATargetUtilizationMaxIncrease: tortoiseHPATargetUtilizationMaxIncrease,
recorder: recorder,
tortoiseHPATargetUtilizationUpdateInterval: tortoiseHPATargetUtilizationUpdateInterval,
defaultHPABehavior: defaultHPABehavior,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put defaultHPABehavior (defined at L224) if defaultHPABehavior (at L55) is nil. Then, we can remove getDefaultHPABehavior function because Service.defaultHPABehavior would be always non-nil

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -361,5 +385,50 @@ func validate(config *Config) error {
}
}

// Validate HPA behavior if specified
if config.DefaultHPABehavior != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

limit the number of vertical scaling up that could have happened for a certain period of time
4 participants