Whisper V3 Large Turbo – Words/Sec Capped at ~284? Bottleneck or Parallelism Limit? #2593

AryanSakhala · 2025-05-14T02:46:21Z

AryanSakhala
May 14, 2025

During a benchmarking run I have been doing, I found openai/whisper-large-v3-turbo showing some strange behaviour.
Irrespective of the concurrency or Sampling Rate of the audio, Words/Sec were constant around ~284.

Am I missing something?
I am using Loadbalancer - nginx
I have deployed it using Vllm

The architecture only uses 4 decoder layers (compared to 32 in Whisper Large), so I expected higher parallelism, but it seems capped.
Is this:

A limitation of the model architecture?

A runtime/framework bottleneck?

Or am I missing an optimization?

Would love to hear from others who’ve tried pushing this model to its limits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper V3 Large Turbo – Words/Sec Capped at ~284? Bottleneck or Parallelism Limit? #2593

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Whisper V3 Large Turbo – Words/Sec Capped at ~284? Bottleneck or Parallelism Limit? #2593

AryanSakhala May 14, 2025

Replies: 0 comments

AryanSakhala
May 14, 2025