Skip to content

performance difference between dd and elbencho #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbd opened this issue Feb 19, 2025 · 1 comment
Open

performance difference between dd and elbencho #75

jbd opened this issue Feb 19, 2025 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@jbd
Copy link

jbd commented Feb 19, 2025

Hello,

I'm trying to understand the performance difference between elbencho and dd, both using directio on a local nvme drive. dd is showing 2 GBytes/s and elbench around 900 MBytes/s.

$ elbencho --version
elbencho
 * Version: 3.0-26
 * Net protocol version: 3.0.23
 * Build date: Feb 19 2025 17:39:42
 * Included optional build features: backtrace corebind libaio libnuma ncurses syncfs syscallh 
 * Excluded optional build features: althttpsvc cuda cufile/gds hdfs mimalloc s3 s3crt 
 * System steady clock precision: 1e-09 sec

The elbencho run:

$ elbencho -w -b 4M -t 1 --blockvarpct 0 --direct -s 10g /local/scratch/tmp/file_elbencho 
OPERATION   RESULT TYPE         FIRST DONE   LAST DONE
=========== ================    ==========   =========
WRITE       Elapsed time     :     11.559s     11.559s
            IOPS             :         221         221
            Throughput MiB/s :         885         885
            Total MiB        :       10240       10240
---

And the dd run:

$ dd if=/dev/zero of=/local/scratch/tmp/file_dd oflag=direct bs=4M count=2560 status=progress
9739173888 bytes (9.7 GB, 9.1 GiB) copied, 5 s, 1.9 GB/s
2560+0 records in
2560+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 5.5149 s, 1.9 GB/s

I feel that I'm missing something obvious here. Any suggestions ?

Thank you !

@breuner breuner self-assigned this Feb 22, 2025
@breuner
Copy link
Owner

breuner commented Feb 22, 2025

hi again @jbd ,

that's indeed an interesting one. i see no obvious explanation for this. actually, dd has to do slightly more work in this case because the input from /dev/zero leads to the 4MB being overwritten with zeros before each write.
in the case of elbencho, the 4MB buffer is filled with random numbers once during the general initialization phase and then simply gets re-used again and again due to the "--blockvarpct 0".

i tried to reproduce this, but wasn't able to. all cases that i tried were either same speed for both tools or elbencho was significantly faster.

thus, i can only speculate what could have caused this in your test:

  • does the drive that you're using maybe have compression or can detect blocks filled with zeros? if the answer to this is not easy because it's not mentioned in the drive specs then we could try making elbencho also write only zeros and see if this changes things. for that we could replace the randGen.fillBuf with a memset to zero here: https://github.com/breuner/elbencho/blob/master/source/workers/LocalWorker.cpp#L1245
  • is the system where you're running this maybe using cpu frequency throttling? could be that the cpu load in the case of elbencho is so low here that the frequency scheduler does not decide to raise the frequency. turning off frequency throttling would be the simple test for this.

those are the only two that come to mind for the moment.

if you're curious, here are the details for tests that i ran:

the internal ssd of a compute node -
same speed for elbencho and dd:

[sven@node001 ~]$ elbencho -w -b 4M -t 1 --blockvarpct 0 --direct -s 10g file_elbencho 
OPERATION   RESULT TYPE         FIRST DONE   LAST DONE
=========== ================    ==========   =========
WRITE       Elapsed time     :     20.615s     20.615s
            IOPS             :         124         124
            Throughput MiB/s :         496         496
            Total MiB        :       10240       10240
---

[sven@node001 ~]$ dd if=/dev/zero of=file_dd oflag=direct bs=4M count=2560 status=progress
10489954304 bytes (10 GB, 9.8 GiB) copied, 21 s, 499 MB/s
2560+0 records in
2560+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 21.4979 s, 499 MB/s

the OS raid-0 of a dgx box -
same speed for dd and elbencho:

sven@dgx01:~/sven/tmp$ elbencho -w -b 4M -t 1 --blockvarpct 0 --direct -s 10g file_elbencho 
OPERATION   RESULT TYPE         FIRST DONE   LAST DONE
=========== ================    ==========   =========
WRITE       Elapsed time     :      8.187s      8.187s
            IOPS             :         312         312
            Throughput MiB/s :        1250        1250
            Total MiB        :       10240       10240
---

sven@dgx01:~/sven/tmp$ dd if=/dev/zero of=file_dd oflag=direct bs=4M count=2560 status=progress
9625927680 bytes (9.6 GB, 9.0 GiB) copied, 8 s, 1.2 GB/s
2560+0 records in
2560+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 8.92122 s, 1.2 GB/s

the data raid-0 of a dgx -
elbencho twice as fast as dd, but very short test runtime:

sven@dgx01:/raid/sven$ elbencho -w -b 4M -t 1 --blockvarpct 0 --direct -s 10g file_elbencho 
OPERATION   RESULT TYPE         FIRST DONE   LAST DONE
=========== ================    ==========   =========
WRITE       Elapsed time     :       738ms       738ms
            IOPS             :        3466        3466
            Throughput MiB/s :       13865       13865
            Total MiB        :       10240       10240
---

sven@dgx01:/raid/sven$ dd if=/dev/zero of=file_dd oflag=direct bs=4M count=2560 status=progress
7600078848 bytes (7.6 GB, 7.1 GiB) copied, 1 s, 7.6 GB/s
2560+0 records in
2560+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 1.41539 s, 7.6 GB/s

the data raid-0 of a dgx again, this time with doubled file size for slightly longer test runtime -
again elbencho twice as fast as dd:

sven@dgx01:/raid/sven$ elbencho -w -b 4M -t 1 --blockvarpct 0 --direct -s 20g file_elbencho 
OPERATION   RESULT TYPE         FIRST DONE   LAST DONE
=========== ================    ==========   =========
WRITE       Elapsed time     :      1.501s      1.501s
            IOPS             :        3409        3409
            Throughput MiB/s :       13636       13636
            Total MiB        :       20480       20480
---

sven@dgx01:/raid/sven$ dd if=/dev/zero of=file_dd oflag=direct bs=4M count=$((2560*2)) status=progress
15359541248 bytes (15 GB, 14 GiB) copied, 2 s, 7.7 GB/s 
5120+0 records in
5120+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 2.8071 s, 7.7 GB/s

@breuner breuner added the question Further information is requested label Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants