Skip to content

feat: add support for hugepages_config #2223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

DrFaust92
Copy link
Contributor

@DrFaust92 DrFaust92 commented Dec 21, 2024

Closes #2320

@DrFaust92 DrFaust92 requested review from ericyz and a team as code owners December 21, 2024 18:18
@DrFaust92 DrFaust92 changed the title feat: add support for hugepages_config + fix setting cgroup_name via "all"" feat: add support for hugepages_config Dec 21, 2024
@apeabody
Copy link
Collaborator

/gcbrun

Copy link
Collaborator

@apeabody apeabody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @DrFaust92!

Can you please add to an example for test coverage, otherwise other than some extra spaces, this LGTM.

NOTE: hugepages_config added in v6.5.0

@apeabody apeabody self-assigned this Dec 24, 2024
@DrFaust92 DrFaust92 requested a review from apeabody January 2, 2025 15:19
@DrFaust92
Copy link
Contributor Author

@apeabody Tried my best with tests here as well 😊 . LMK if something needs fixing

@apeabody
Copy link
Collaborator

apeabody commented Jan 2, 2025

/gcbrun

@apeabody
Copy link
Collaborator

apeabody commented Jan 2, 2025

FYI @DrFaust92 - A few of the pools do use n1, but the rest are currently e2.

TestNodePool 2025-01-02T23:41:57Z command.go:185: Error: error creating NodePool: googleapi: Error 400: 1G hugepages was set but the nodepool machine type(e2-medium) does not support this configuration.
TestNodePool 2025-01-02T23:41:57Z command.go:185: Details:
TestNodePool 2025-01-02T23:41:57Z command.go:185: [
TestNodePool 2025-01-02T23:41:57Z command.go:185:   {
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "@type": "type.googleapis.com/google.rpc.RequestInfo",
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "requestId": "0x7b26367c30d761e2"
TestNodePool 2025-01-02T23:41:57Z command.go:185:   }
TestNodePool 2025-01-02T23:41:57Z command.go:185: ]
TestNodePool 2025-01-02T23:41:57Z command.go:185: , badRequest
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185:   with module.example.module.gke.google_container_node_pool.pools["pool-04"],
TestNodePool 2025-01-02T23:41:57Z command.go:185:   on ../../../modules/beta-public-cluster/cluster.tf line 597, in resource "google_container_node_pool" "pools":
TestNodePool 2025-01-02T23:41:57Z command.go:185:  597: resource "google_container_node_pool" "pools" {
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185: Error: error creating NodePool: googleapi: Error 400: 1G hugepages was set but the nodepool machine type(e2-medium) does not support this configuration.
TestNodePool 2025-01-02T23:41:57Z command.go:185: Details:
TestNodePool 2025-01-02T23:41:57Z command.go:185: [
TestNodePool 2025-01-02T23:41:57Z command.go:185:   {
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "@type": "type.googleapis.com/google.rpc.RequestInfo",
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "requestId": "0x48e2d757411a4297"
TestNodePool 2025-01-02T23:41:57Z command.go:185:   }
TestNodePool 2025-01-02T23:41:57Z command.go:185: ]
TestNodePool 2025-01-02T23:41:57Z command.go:185: , badRequest
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185:   with module.example.module.gke.google_container_node_pool.pools["pool-01"],
TestNodePool 2025-01-02T23:41:57Z command.go:185:   on ../../../modules/beta-public-cluster/cluster.tf line 597, in resource "google_container_node_pool" "pools":
TestNodePool 2025-01-02T23:41:57Z command.go:185:  597: resource "google_container_node_pool" "pools" {
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185: Error: error creating NodePool: googleapi: Error 400: 1G hugepages was set but the nodepool machine type(n1-standard-2) does not support this configuration.
TestNodePool 2025-01-02T23:41:57Z command.go:185: Details:
TestNodePool 2025-01-02T23:41:57Z command.go:185: [
TestNodePool 2025-01-02T23:41:57Z command.go:185:   {
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "@type": "type.googleapis.com/google.rpc.RequestInfo",
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "requestId": "0x16e561c8b65549ca"
TestNodePool 2025-01-02T23:41:57Z command.go:185:   }
TestNodePool 2025-01-02T23:41:57Z command.go:185: ]
TestNodePool 2025-01-02T23:41:57Z command.go:185: , badRequest
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185:   with module.example.module.gke.google_container_node_pool.pools["pool-03"],
TestNodePool 2025-01-02T23:41:57Z command.go:185:   on ../../../modules/beta-public-cluster/cluster.tf line 597, in resource "google_container_node_pool" "pools":
TestNodePool 2025-01-02T23:41:57Z command.go:185:  597: resource "google_container_node_pool" "pools" {
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185: Error: error creating NodePool: googleapi: Error 400: 1G hugepages was set but the nodepool machine type(n1-standard-2) does not support this configuration.
TestNodePool 2025-01-02T23:41:57Z command.go:185: Details:
TestNodePool 2025-01-02T23:41:57Z command.go:185: [
TestNodePool 2025-01-02T23:41:57Z command.go:185:   {
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "@type": "type.googleapis.com/google.rpc.RequestInfo",
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "requestId": "0x40383fdeb6fa524a"
TestNodePool 2025-01-02T23:41:57Z command.go:185:   }
TestNodePool 2025-01-02T23:41:57Z command.go:185: ]
TestNodePool 2025-01-02T23:41:57Z command.go:185: , badRequest
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185:   with module.example.module.gke.google_container_node_pool.pools["pool-05"],
TestNodePool 2025-01-02T23:41:57Z command.go:185:   on ../../../modules/beta-public-cluster/cluster.tf line 597, in resource "google_container_node_pool" "pools":
TestNodePool 2025-01-02T23:41:57Z command.go:185:  597: resource "google_container_node_pool" "pools" {
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185: Error: error creating NodePool: googleapi: Error 400: 1G hugepages was set but the nodepool machine type(n1-standard-2) does not support this configuration.
TestNodePool 2025-01-02T23:41:57Z command.go:185: Details:
TestNodePool 2025-01-02T23:41:57Z command.go:185: [
TestNodePool 2025-01-02T23:41:57Z command.go:185:   {
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "@type": "type.googleapis.com/google.rpc.RequestInfo",
TestNodePool 2025-01-02T23:41:57Z command.go:185:     "requestId": "0x37ecaeacf733a8ea"
TestNodePool 2025-01-02T23:41:57Z command.go:185:   }
TestNodePool 2025-01-02T23:41:57Z command.go:185: ]
TestNodePool 2025-01-02T23:41:57Z command.go:185: , badRequest
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z command.go:185:   with module.example.module.gke.google_container_node_pool.pools["pool-02"],
TestNodePool 2025-01-02T23:41:57Z command.go:185:   on ../../../modules/beta-public-cluster/cluster.tf line 597, in resource "google_container_node_pool" "pools":
TestNodePool 2025-01-02T23:41:57Z command.go:185:  597: resource "google_container_node_pool" "pools" {
TestNodePool 2025-01-02T23:41:57Z command.go:185: 
TestNodePool 2025-01-02T23:41:57Z retry.go:99: Returning due to fatal error: FatalError{Underlying: error while running command: exit status 1; 

@DrFaust92
Copy link
Contributor Author

@apeabody should i change the node pools types to support the test case? or how should I approach this?

@apeabody
Copy link
Collaborator

apeabody commented Jan 3, 2025

@apeabody should i change the node pools types to support the test case? or how should I approach this?

The 2-m huge pages should be fine as-is, however for the 1-g perhaps just test a single node pool which has a supported machine type. Maybe change "pool-05" to c3-standard-4?

1-gigabtye-sized huge pages are only available on the A3, C2D, C3, C3A, C3D, C4, CT5E, CT5L, CT5LP, CT6E, H3, M2, M3, or Z3 [machine types](https://cloud.google.com/compute/docs/machine-resource).

@DrFaust92 DrFaust92 force-pushed the huge_pages branch 2 times, most recently from 99c20b9 to 51a780f Compare January 4, 2025 23:03
@DrFaust92
Copy link
Contributor Author

apeabody lets try now

@apeabody
Copy link
Collaborator

apeabody commented Jan 6, 2025

/gcbrun

@apeabody apeabody removed their assignment Feb 3, 2025
@apeabody
Copy link
Collaborator

apeabody commented Apr 3, 2025

/gcbrun

@DrFaust92
Copy link
Contributor Author

apeabody added try as requested

@rawanbadawi
Copy link

@apeabody My customer needs this feature, is it possible to approve the latest?

Thanks

@apeabody
Copy link
Collaborator

apeabody commented Apr 7, 2025

/gcbrun

@apeabody
Copy link
Collaborator

apeabody commented Apr 7, 2025

INT test results:

        	            	Error: Error in function call
        	            	
        	            	  on ../../../modules/beta-public-cluster/cluster.tf line 938, in resource "google_container_node_pool" "pools":
        	            	 938:             hugepage_size_1g = try(coalesce(local.node_pools_hugepage_size_1g[each.value["name"]], local.node_pools_hugepage_size_1g["all"], null))
        	            	    ├────────────────
        	            	    │ each.value["name"] is "pool-03"
        	            	    │ local.node_pools_hugepage_size_1g is object with 8 attributes
        	            	    │ local.node_pools_hugepage_size_1g["all"] is ""
        	            	
        	            	Call to function "try" failed: no expression succeeded:
        	            	- Error in function call (at
        	            	../../../modules/beta-public-cluster/cluster.tf:938,36-45)
        	            	  Call to function "coalesce" failed: no non-null, non-empty-string arguments.
        	            	
        	            	At least one expression must produce a successful result

Looks like try() can't be used in this situation, probably need to use a less elegant ternary, etc.

@DrFaust92 DrFaust92 marked this pull request as draft April 9, 2025 02:39
@DrFaust92 DrFaust92 marked this pull request as ready for review April 25, 2025 23:28
@DrFaust92
Copy link
Contributor Author

local plans looks ok 🙏

@apeabody
Copy link
Collaborator

/gcbrun

@DrFaust92
Copy link
Contributor Author

This seemed to have failed fast /: . my ability to test this is limited as i can at most check this as a side branch on a running cluster

@apeabody
Copy link
Collaborator

This seemed to have failed fast /: . my ability to test this is limited as i can at most check this as a side branch on a running cluster

Flaky run, an API failed to active in time. Re-running

@DrFaust92
Copy link
Contributor Author

apeabody any details?

@apeabody
Copy link
Collaborator

apeabody any details?

Looks like need to adjust the test:

Step #54 - "apply node-pool-local":         	Error Trace:	/builder/home/go/pkg/mod/github.com/gruntwork-io/[email protected]/modules/terraform/apply.go:34
Step #54 - "apply node-pool-local":         	            				/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/[email protected]/pkg/tft/terraform.go:571
Step #54 - "apply node-pool-local":         	            				/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/[email protected]/pkg/tft/terraform.go:630
Step #54 - "apply node-pool-local":         	            				/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/[email protected]/pkg/tft/terraform.go:669
Step #54 - "apply node-pool-local":         	            				/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/[email protected]/pkg/utils/stages.go:31
Step #54 - "apply node-pool-local":         	            				/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/[email protected]/pkg/tft/terraform.go:669
Step #54 - "apply node-pool-local":         	Error:      	Received unexpected error:
Step #54 - "apply node-pool-local":         	            	FatalError{Underlying: error while running command: exit status 1; 
Step #54 - "apply node-pool-local":         	            	Error: error creating NodePool: googleapi: Error 400: 'pd-standard' with confidential storage mode: false is not compatible with the machine type c3-standard-4, please review the GCP online documentation for available persistent disk options.
Step #54 - "apply node-pool-local":         	            	Details:
Step #54 - "apply node-pool-local":         	            	[
Step #54 - "apply node-pool-local":         	            	  {
Step #54 - "apply node-pool-local":         	            	    "@type": "type.googleapis.com/google.rpc.RequestInfo",
Step #54 - "apply node-pool-local":         	            	    "requestId": "0x16c806feae5789fb"
Step #54 - "apply node-pool-local":         	            	  }
Step #54 - "apply node-pool-local":         	            	]
Step #54 - "apply node-pool-local":         	            	, badRequest
Step #54 - "apply node-pool-local":         	            	
Step #54 - "apply node-pool-local":         	            	  with module.example.module.gke.google_container_node_pool.pools["pool-05"],
Step #54 - "apply node-pool-local":         	            	  on ../../../modules/beta-public-cluster/cluster.tf line 644, in resource "google_container_node_pool" "pools":
Step #54 - "apply node-pool-local":         	            	 644: resource "google_container_node_pool" "pools" {
Step #54 - "apply node-pool-local":         	            	}

DrFaust92 added 6 commits May 5, 2025 19:19
Signed-off-by: drfaust92 <[email protected]>
Signed-off-by: drfaust92 <[email protected]>
Signed-off-by: drfaust92 <[email protected]>
Signed-off-by: drfaust92 <[email protected]>
Signed-off-by: drfaust92 <[email protected]>
@DrFaust92
Copy link
Contributor Author

apeabody Ive attempted to make changes to accommodate this node type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support hugepages config in the linux node config block
3 participants