Skip to content

Max runtime for k8s job #4376

Max runtime for k8s job

Max runtime for k8s job #4376

Triggered via pull request May 16, 2025 15:15
@StebossSteboss
ready_for_review #1453
sbosisio/k8s_orphan
Status Cancelled
Total duration 6h 26m 55s
Artifacts 40

ci.yaml

on: pull_request
metadata
0s
metadata
bump-manifest
26s
bump-manifest
Matrix: amd64 / test-distribution
Matrix: arm64 / test-distribution
amd64  /  ...  /  build-base
3m 31s
amd64 / build-base / build-base
arm64  /  ...  /  build-base
3m 42s
arm64 / build-base / build-base
amd64  /  ...  /  build-jax
12m 19s
amd64 / build-jax / build-jax
arm64  /  ...  /  build-jax
19m 34s
arm64 / build-jax / build-jax
Matrix: amd64 / test-jax / run-unit-test
Matrix: amd64 / test-te-h100 / te-test-eks
Matrix: amd64 / test-te-unit-a100 / run-unit-test
amd64  /  ...  /  launch-slurm-runner
1h 5m
amd64 / test-jax / runner / launch-slurm-runner
amd64  /  test-nsys-jax-eks
4h 1m
amd64 / test-nsys-jax-eks
amd64  /  ...  /  launch-slurm-runner
1h 30m
amd64 / test-te-unit-a100 / runner / launch-slurm-runner
amd64  /  ...  /  build-maxtext
9m 31s
amd64 / build-maxtext / build-maxtext
amd64  /  ...  /  build-upstream-t5x
7m 2s
amd64 / build-upstream-t5x / build-upstream-t5x
amd64  /  ...  /  build-axlearn
6m 1s
amd64 / build-axlearn / build-axlearn
Matrix: amd64 / test-nsys-jax / run-unit-test
amd64  /  ...  /  build-equinox
2s
amd64 / build-equinox / build-equinox
amd64  /  ...  /  launch-slurm-runner
1h 16m
amd64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: arm64 / test-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-h100 / te-test-eks
Waiting for pending jobs
Matrix: arm64 / test-te-unit-a100 / run-unit-test
Waiting for pending jobs
arm64  /  test-nsys-jax-eks
0s
arm64 / test-nsys-jax-eks
arm64  /  ...  /  launch-slurm-runner
arm64 / test-jax / runner / launch-slurm-runner
arm64  /  ...  /  launch-slurm-runner
arm64 / test-te-unit-a100 / runner / launch-slurm-runner
arm64  /  ...  /  build-maxtext
11m 50s
arm64 / build-maxtext / build-maxtext
arm64  /  ...  /  build-upstream-t5x
11m 12s
arm64 / build-upstream-t5x / build-upstream-t5x
arm64  /  ...  /  build-axlearn
6m 59s
arm64 / build-axlearn / build-axlearn
Matrix: arm64 / test-nsys-jax / run-unit-test
Waiting for pending jobs
arm64  /  ...  /  build-equinox
7m 31s
arm64 / build-equinox / build-equinox
arm64  /  ...  /  launch-slurm-runner
arm64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: amd64 / test-maxtext / maxtext-multinode
Matrix: amd64 / test-maxtext / single-process-multi-device
amd64  /  ...  /  build-rosetta
16m 41s
amd64 / build-rosetta-t5x / build-rosetta
amd64  /  test-axlearn-eks
6h 0m
amd64 / test-axlearn-eks
amd64  /  test-axlearn-fuji-models-eks
4h 0m
amd64 / test-axlearn-fuji-models-eks
Matrix: amd64 / test-nsys-jax-archive
Matrix: arm64 / test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: arm64 / test-maxtext / single-process-multi-device
Waiting for pending jobs
arm64  /  ...  /  build-rosetta
18m 7s
arm64 / build-rosetta-t5x / build-rosetta
arm64  /  test-axlearn-eks
0s
arm64 / test-axlearn-eks
arm64  /  test-axlearn-fuji-models-eks
0s
arm64 / test-axlearn-fuji-models-eks
Matrix: arm64 / test-nsys-jax-archive
amd64  /  ...  /  test-maxtext-metrics
14s
amd64 / test-maxtext / test-maxtext-metrics
amd64  /  collect-docker-tags
0s
amd64 / collect-docker-tags
Matrix: amd64 / test-rosetta-t5x / vit-multi-gpu-multi-node
arm64  /  ...  /  test-maxtext-metrics
arm64 / test-maxtext / test-maxtext-metrics
arm64  /  collect-docker-tags
0s
arm64 / collect-docker-tags
Matrix: arm64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
amd64  /  ...  /  sitrep
15s
amd64 / test-maxtext / test-maxtext-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-summary
0s
amd64 / test-rosetta-t5x / test-t5x-rosetta-summary
amd64  /  ...  /  test-t5x-rosetta-metrics
17s
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
arm64  /  ...  /  sitrep
arm64 / test-maxtext / test-maxtext-sitrep / sitrep
arm64  /  ...  /  test-t5x-rosetta-summary
arm64 / test-rosetta-t5x / test-t5x-rosetta-summary
arm64  /  ...  /  test-t5x-rosetta-metrics
arm64 / test-rosetta-t5x / test-t5x-rosetta-metrics
amd64  /  ...  /  test-maxtext-outcome
0s
amd64 / test-maxtext / test-maxtext-outcome
amd64  /  ...  /  sitrep
5s
amd64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
arm64  /  ...  /  test-maxtext-outcome
arm64 / test-maxtext / test-maxtext-outcome
arm64  /  ...  /  sitrep
arm64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-outcome
0s
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
arm64  /  ...  /  test-t5x-rosetta-outcome
arm64 / test-rosetta-t5x / test-t5x-rosetta-outcome
make-publish-configs
2s
make-publish-configs
merge-new-manifest
0s
merge-new-manifest
Matrix: publish-containers
finalize  /  workflow-badge
9s
finalize / workflow-badge
finalize  /  report
9s
finalize / report
finalize  /  upload-badge
3s
finalize / upload-badge
finalize  /  publish-badge
2s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

2 errors and 1 warning
amd64 / build-equinox / build-equinox
A task was canceled.
amd64 / test-axlearn-eks
The job running on runner eks-2mr4m-runner-8lspr has exceeded the maximum execution time of 360 minutes.
amd64 / test-nsys-jax-eks
This job failure may be caused by using an out of date version of GitHub runner on your self-hosted runner. You are currently using GitHub runner version 2.323.0. Please update to the latest version 2.324.0

Artifacts

Produced during runtime
Name Size Digest
artifact-axlearn-build-amd64
567 Bytes
sha256:1c1245cd0c0d84831b43c16b31aa8e64422af9261314f720251109541979f07b
artifact-axlearn-build-arm64
566 Bytes
sha256:bbac397b0f36060881d38b67ccf380e047c8383289a443ffa75b4957095be54b
artifact-base-build-amd64
566 Bytes
sha256:3779d719fa806edb7c147418ea793c2a3bbe782e6e9f552c522beb534864a093
artifact-base-build-arm64
566 Bytes
sha256:d32a0cd317354abd5d68293f9711558f754231e64d2e68948b33514d96d500f3
artifact-equinox-build-arm64
569 Bytes
sha256:1d70416814eb2ebb204d3f375a808427201ce46d9fd43f6bb18ea4aa675d8c0f
artifact-final-report
3.75 KB
sha256:3b1000bf580a7a3aa2ec88516fa84bf67a5b70aa6b05ba53270a151a5247be3b
artifact-jax-build-amd64
553 Bytes
sha256:deafb5441e70c2b24ba949aa01f55693d7161f1605d2d82e64ac5ddca1c5bae4
artifact-jax-build-arm64
553 Bytes
sha256:c11c4ad0171edbeba5aed62ce7e554bd86a79ce79d14fad08a2d0827f6aabafb
artifact-maxtext-build-amd64
567 Bytes
sha256:9fd69b9eade5a3018fbdb2215501a57a06aa338904699e902621a587ec596a17
artifact-maxtext-build-arm64
569 Bytes
sha256:a9dd1e22c34153d41744b3551d57d7b7c4e5afab108adcc6cd26792bbe41ef0d
artifact-maxtext-test
1.83 KB
sha256:390ddb279701a205eca00b8a6b2406bae4b83574ef0a181800cb316696682628
artifact-multigpu-test-transformerengine-15071739304-8gpu-unittest
584 Bytes
sha256:a1b10bcaa20e7b5bfab8ffa76210c66af4fce6258e596070aa47c96da2433d0d
artifact-rosetta-build-t5x-amd64
584 Bytes
sha256:4bcfd83ea0f5624bdc7cac517047ccea846127d64a790001ceed23662af17372
artifact-rosetta-build-t5x-arm64
584 Bytes
sha256:893bd6ef347857c7c5e7dc5563c0c7d96172e069462812ac6a6c8ec9703d4308
artifact-rosetta-t5x-mgmn-test
1.27 KB
sha256:dd1d4f109c10ee2cda9334d44f9a4028800aa9eae6bc6aa769e35a704238e1b1
artifact-t5x-build-amd64
569 Bytes
sha256:16db005e006b11142bd2098c526750704b962b0ff28972cbad1b03f25ba473bd
artifact-t5x-build-arm64
567 Bytes
sha256:a377d49f2587e4984d6928e77cacb5f82fe147e5947f97e9baf9093cba8bcb9a
artifact-workflow-metadata
278 Bytes
sha256:4cf85f4058923b8357c730c65f3e5a45d30a2914ee90dd3b042b07e57c4892ab
bumped-manifest
46.9 KB
sha256:97894ea6816d4ea3888ac8c4610d198a8a0a2ee4157a887358185f177fc28f46
final-axlearn
263 Bytes
sha256:ecb806aefccbff85abe7bbf46324174111a7bec244261d2fd0a21efcbf3a14a5
final-base
254 Bytes
sha256:bc29994b516ce5427b3efd30f1e18a943cd305d116ebd786dc6cbd719b325040
final-equinox
263 Bytes
sha256:277f4af9d9c38369a904f0b7828d596aef0614ebef8bf56ceae04b40a98d783e
final-jax
251 Bytes
sha256:9521e4819fde997323211021aa24dc6f8f00474bdfcd1593da24caa47b379baf
final-maxtext
263 Bytes
sha256:5783c7f8c70f3e44ad86d4daa26f8d241b8d4278fc0c29af9e6bc77fe028e29f
final-t5x
251 Bytes
sha256:38774025501ddecca6441e92d538ab27e69ed808250f1ac9b4086f3ee05d4eca
final-upstream-t5x
277 Bytes
sha256:0a82d5a83bdd14e62322bbe160299f3c79dfad3eb5200f713a719f37c5b5ed92
jax-unit-test-A100
19.5 KB
sha256:2471a2b76171c017917275541ac8c364706bdd3bdafb100ca79e088ccf3a60ad
mealkit-axlearn
272 Bytes
sha256:51622fd34070fa606d52964a3a76da0ae6d10021eeae951708ffc7cddf9b3aea
mealkit-equinox
273 Bytes
sha256:fbea841423f13050c10438f69a41c81d8a2605fdec51fa021ce100202d8be179
mealkit-jax
261 Bytes
sha256:6ed41a1a4b299b231d9dfc1cc0333e980f174ca95d1af8a708669f32423311bc
mealkit-maxtext
272 Bytes
sha256:1d81151abcb868cf4c138dc279d091db6755bc23124d14195f2afcb85b4a3c85
mealkit-t5x
261 Bytes
sha256:49c682f1e525cc6a5ddea3deb300dfa1166abcea90f3ebf6433ecce5e316c0bc
mealkit-upstream-t5x
286 Bytes
sha256:5508c13a770825baa4f4bc11b7215f15fa8ce41be0967027ff908bfef9f1248a
nsys-jax-unit-test-A100
33.2 MB
sha256:a00b3cace5e54ca25a66aafcf8dae73e47a41c406f911df066c06247906ccbc1
rosetta-t5x-metrics-test-log
1.03 KB
sha256:f06d8f52ee9827c3f311485b473baf9a4fc6891fd68436325066520c11d583aa
rosetta-t5x-vit-15071739304-VIT8G1N
31.7 KB
sha256:0eb358ff8abafb393c641e75476894170859cc7163453d457e3549e609fb4469
te-unit-test-A100
933 KB
sha256:64e849471fd2ecaec50ad78eb9560d6226343f20934b0b0fea9909752f4e5d58
upstream-maxtext-15071739304-1DP2FSDP4TP1PP_single_process
20 KB
sha256:c4cb26cb60d086925e869e8373bd455d6783f09acb2f720480a72cbc574b9fe0
upstream-maxtext-15071739304-2DP2FSDP2TP1PP
28.1 KB
sha256:3001a356bd3aefb4f926ac148cac5908777634307fec9667ced16d2e0a893a29
upstream-maxtext-metrics-test-log
1.82 KB
sha256:1d19ace6915e7305d59dae402107395f93253c96b08d7e1eefd2956d56259206