Whisper V3 Large Turbo – Words/Sec Capped at ~284? Bottleneck or Parallelism Limit? #2593
Closed
AryanSakhala
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
During a benchmarking run I have been doing, I found openai/whisper-large-v3-turbo showing some strange behaviour.
Irrespective of the concurrency or Sampling Rate of the audio, Words/Sec were constant around ~284.
Am I missing something?
I am using Loadbalancer - nginx
I have deployed it using Vllm
The architecture only uses 4 decoder layers (compared to 32 in Whisper Large), so I expected higher parallelism, but it seems capped.
Is this:
Would love to hear from others who’ve tried pushing this model to its limits.
Beta Was this translation helpful? Give feedback.
All reactions