Skip to content

The benchmark result from 2025-02-10 is definitely incorrect #8229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
KostyaTretyak opened this issue Feb 11, 2025 · 149 comments
Open

The benchmark result from 2025-02-10 is definitely incorrect #8229

KostyaTretyak opened this issue Feb 11, 2025 · 149 comments

Comments

@KostyaTretyak
Copy link
Contributor

The first inaccuracy, which is immediately noticeable, is the abnormally strong difference between benchmarks of the same frameworks, but on node.js and bun.js. For example, ditsmod on node.js shows 3.5 K per second, and on bun.js - 99 K. In fact, ditsmod v3 on bun.js works 1.5 - 2 times faster, but definitely not 30 times faster.

Even if we take the results for node.js alone, the benchmarks are still clearly incorrect. For example, nestjs-express shows even better results than bare express, which is impossible if both these frameworks are running on the same version of node.js. Correct results should show that express is always 15-20% faster than nestjs-express.

In addition, express should show more than half the performance of ditsmod or fastify.

@waghanza
Copy link
Collaborator

Could you check the implementation then ?

Maybe a cluster thing then

@cyrusmsk
Copy link
Contributor

cyrusmsk commented Feb 11, 2025

Many frameworks changed significantly
For example in Go - only one showing numbers close to previous results
Every other solution dropped to around 4k-10k from 100k-500k
Same for C++ frameworks - significant drop

@KostyaTretyak
Copy link
Contributor Author

@waghanza, I can't tell you what's wrong with the workflow, but something is definitely not going as it should.

This is easy to check:

cd javascript/ditsmod
npm install
npm run build
NODE_APP=dist/main.js node cluster.mjs

From second terminal run 3-5 times this command:

wrk -H 'Connection: close' -d 5s -c 8 --timeout 8 -t 4 http://0.0.0.0:3000

The outputs:

Running 5s test @ http://0.0.0.0:3000
  4 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.94ms  386.05us   5.53ms   73.05%
    Req/Sec     1.87k   108.61     2.05k    80.50%
  37315 requests in 5.00s, 3.35MB read
Requests/sec:   7457.23
Transfer/sec:    684.55KB

Stop ditsmod in first terminal and then run this commands:

cd ../ditsmod-bun
bun install
bun run build
bun run start

From second terminal run 3-5 times this command:

wrk -H 'Connection: close' -d 5s -c 8 --timeout 8 -t 4 http://0.0.0.0:3000

The last command outputs:

Running 5s test @ http://0.0.0.0:3000
  4 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   414.17us  473.23us  10.70ms   91.65%
    Req/Sec     3.70k   550.42     6.43k    93.07%
  74308 requests in 5.10s, 5.31MB read
  Socket errors: connect 0, read 74302, write 0, timeout 0
Requests/sec:  14568.88
Transfer/sec:      1.04MB

So, ditsmod runs x2 as fast on Bun.js as it does on Node.js, but definitely not x30 as shown in the benchmark results from 2025-02-10.

@waghanza
Copy link
Collaborator

Could you make the same for all tree endpoints?

Maybe results are totally wrong for one of them

@KostyaTretyak
Copy link
Contributor Author

It doesn't make sense for everyone to do this, but for example for Koa:

  • Node.js - 7190 (3254 in the last benchmarks on the website)
  • Bun.js - 12219 (93368 in the last benchmarks on the website)

@akyoto
Copy link
Contributor

akyoto commented Feb 11, 2025

The numbers are definitely wrong.
Take a look at Go and compare it with the 2024-12 results.

I have no clue why my own library is the only one showing the same results.

I'm just guessing, but maybe the machine got occupied with other tasks during the benchmark and a re-run could fix it?

@KostyaTretyak
Copy link
Contributor Author

No, the current results should be different from 2024-12 because the HTTP keep alive feature is not currently used. But still, the benchmarks are definitely incorrect.

@waghanza
Copy link
Collaborator

The whole command is

wrk -H 'Connection: close' --connections 64 --threads 8 --duration 15 --timeout 1 --script /home/waghanza/web-frameworks/pipeline.lua http://172.17.0.2:3000/

@KostyaTretyak

The full log is here =>https://gist.github.com/waghanza/92366d18948972463a918f3e3ece2be7

Seems that there is a lot of socket errors in favor of bun

@KostyaTretyak
Copy link
Contributor Author

@waghanza, regarding bun errors, I don't know if the benchmark results can be trusted. Other frameworks also throw these errors when running on bun.

But these logs do not contain logs from the framework. Why? Are they specifically disabled?

These logs should look like:

info [AppModule]: PreRouterExtension: setted route GET "/".

@Kaliumhexacyanoferrat
Copy link
Contributor

Kaliumhexacyanoferrat commented Feb 12, 2025

So are we sure that we do not have server implementations that just ignore the connection close instruction as it is not common anymore? 🤔

To be honest it now feels more like a TCP handshake benchmark than a web server benchmark (or maybe more a HTTP/1.0 one). As long as I understood the new changes correctly.

@KostyaTretyak
Copy link
Contributor Author

server implementations that just ignore the connection close instruction as it is not common anymore

The server receives not just the instruction, but also the HTTP version. If it does not follow the instruction to close the connection, then in the same way it can ignore any instructions at its discretion.

Can compliance with these instructions be easily verified?

To be honest it now feels more like a TCP handshake benchmark than a web server benchmark

Single Page Applications (which are very popular now) often use two web servers: one for static files, one for dynamic requests. Benchmarks are usually run for frameworks with dynamic requests, and probably in 90% of cases, such a feature as "keep alive" is not used (in particular because of use a single GraphQL query).

So, in the real world, dynamic queries are handled by closing the connection after a few requests (usually 1-3 requests). In my opinion, in real-world conditions, dynamic queries never get thousands of requests on a single connection, as has been the case in benchmarks so far.

@Kaliumhexacyanoferrat
Copy link
Contributor

Kaliumhexacyanoferrat commented Feb 13, 2025

Can compliance with these instructions be easily verified?

You could do curl -v http://server http://server with the Connection: close header which issues two requests and states whether the connection has been re-used or not.

If you look at the results for Java it is pretty obvious that the first two frameworks are not HTTP compliant. See RFC 9112 section 9.6.

In my opinion, in real-world conditions, dynamic queries never get thousands of requests on a single connection, as has been the case in benchmarks so far.

I agree, both ends of this range feel kind of synthetic.

@cyrusmsk
Copy link
Contributor

Many of these things were described just in previous Issue: #8116

Andrea even prepared a simple validation toolkit
And yes - many solutions from this repo don’t give a %%%% about http standard :)

@akyoto
Copy link
Contributor

akyoto commented Feb 13, 2025

But why would somebody send "Connection: close" and immediately follow up with another request on the same connection?

I believe that's what's happening here in the benchmarks and this combination doesn't seem to make sense.

The RFC also states this:

Connection: close as a request header field indicates that this is the **last** request
that the client will send on this connection
A client that sends a "close" connection option MUST NOT send further requests on that connection

^ The benchmark seems to violate this rule.

@KostyaTretyak
Copy link
Contributor Author

The benchmark seems to violate this rule

These rules should work for real-world applications. But in the real world, no web server receives thousands of requests per second on a single connection. Closing the connection after each request brings the benchmarks much closer to real-world usage than allowing thousands of requests to be made on a single connection.

@Kaliumhexacyanoferrat
Copy link
Contributor

Closing the connection after each request brings the benchmarks much closer to real-world usage than allowing thousands of requests to be made on a single connection.

Then the benchmark should actually discard the connection and not re-use it as it obviously happens.

@akyoto
Copy link
Contributor

akyoto commented Feb 13, 2025

The benchmark seems to violate this rule

These rules should work for real-world applications. But in the real world, no web server receives thousands of requests per second on a single connection. Closing the connection after each request brings the benchmarks much closer to real-world usage than allowing thousands of requests to be made on a single connection.

I think you misunderstood what I was trying to say. I'm saying the client doesn't follow the RFC here and needs to be fixed.

@KostyaTretyak
Copy link
Contributor Author

@akyoto, why are you sure that the client is trying to send requests to an already closed connection? Why can't subsequent requests open a new connection?

@akyoto
Copy link
Contributor

akyoto commented Feb 13, 2025

@akyoto, why are you sure that the client is trying to send requests to an already closed connection?

It's based on the numbers I see. My reverse-proxy targeted web server ignores "Connection: close" and is the only Go library that keeps getting the same results as in 2024-12 (I might implement in the future, or not, but that's off-topic).

So, logically speaking, the client must be sending the same number of requests as in 2024.

But it shouldn't, because it sent a "Connection: close".

The client has an obligation to stop sending requests after that.

@KostyaTretyak
Copy link
Contributor Author

My reverse-proxy targeted web server ignores "Connection: close"

This is a problem on your web server's side. It's clearly not working according to the HTTP rules you're urging us to follow.

So, logically speaking, the client must be sending the same number of requests as in 2024.

You gave an example of how your web server works and concluded that since your web server ignores this instruction, then there is no need to send these instructions. But, in fact, this is not logical at all.

@akyoto
Copy link
Contributor

akyoto commented Feb 13, 2025

My reverse-proxy targeted web server ignores "Connection: close"

This is a problem on your web server's side. It's clearly not working according to the HTTP rules you're urging us to follow.

Whether that's a problem or not is irrelevant and off-topic here.
I could go on explaining how it's not a problem in certain use-cases, but again, I'll spare you the off-topic conversations.
Let's focus on the actual problem.

So, logically speaking, the client must be sending the same number of requests as in 2024.

You gave an example of how your web server works and concluded that since your web server ignores this instruction, then there is no need to send these instructions. But, in fact, this is not logical at all.

No @KostyaTretyak that's not what I'm saying at all.

It's not necessary to stop sending "Connection: close".

You have 2 solutions here:

a) Send "Connection: close" and stop sending requests after that.
b) Not send "Connection: close" and keep the old behavior.

You probably misunderstood my post as a call to use b) as a solution, but that's not my point at all.
You can use a) or b), I don't care.
Either one is fine.
But a solution has to be decided on.

@KostyaTretyak
Copy link
Contributor Author

Send "Connection: close" and stop sending requests after that

But why? Above you explained that you want the client to not send requests to an already closed connection. But you still didn't answer my question:

@akyoto, why are you sure that the client is trying to send requests to an already closed connection? Why can't subsequent requests open a new connection?

@Kaliumhexacyanoferrat
Copy link
Contributor

Well, because you want to test web server frameworks and not bananas, I suppose?

A client that sends a "close" connection option MUST NOT send further requests on that connection (after the one containing the "close") and MUST close the connection after reading the final response message corresponding to this request.

@akyoto
Copy link
Contributor

akyoto commented Feb 13, 2025

The golden rule of network communications is to never trust the other side.
It doesn't matter what the server does - the client can't trust the server and it needs to stop sending requests.
This is not only required by the RFC but it also makes sense from a logical standpoint.

Either way, I don't think people understand that I'm arguing against my own benefit here.

I mean, if you want to keep sending requests on a connection with this header on it, be my guest.
It'll make the numbers of servers with non-compliant implementations look much better than they are.

So, do you want to improve the accuracy of the benchmark, or not?

Because I can guarantee you, there will be many more servers ignoring the header.
Complaining about oranges being orange won't change anything.
Do you want more accurate numbers or not?

@waghanza
Copy link
Collaborator

Thanks for you insights, and glad there is a debate here 🎉

Not sure to understand

Then the benchmark should actually discard the connection and not re-use it as it obviously happens.

@Kaliumhexacyanoferrat

The whole idea of this project is to test framework in real world scanario :

  • we have disabled keep alive, which mess the results facing the above goal
  • the scario will change with database loading, serialization ....
  • the benchmark will not run indefinitely on my workstation, a real infrastructure will be used
  • we definitely needs to check some compliance here @trikko spotted some frameworks that are not htp compliant like others

@akyoto
Copy link
Contributor

akyoto commented Feb 13, 2025

@waghanza The problem is that we are talking about 2 different things here.

You talk about the testing methodology. Nobody here has a problem with your testing methodology.
Sure, we could go into the rabbit hole of discussing what's more "real", but from my understanding of the people posting here nobody criticizes this decision.

The problem is the testing implementation itself which is incorrect as your client does not adhere to common principles like stopping to send requests after it tells the server to close the connection. It is mentioned explicitly in the specs that the client should not do that and because of this buggy implementation you get a lot of incorrect outcomes. Even if we disregard the specs for a second and just think about this, why would you tell everybody that the highway is going to be closed and then attempt to drive over said highway? You should not drive over the highway if it's going to be closed. The correct thing to do here is to find a new one.

methodology != implementation

@waghanza
Copy link
Collaborator

Understood. I do not take it as critism but a way to improe the whole project

As I understand, the tool (wrk) send http request after closing the connection ?

@trikko
Copy link
Contributor

trikko commented Feb 13, 2025

How can it send data on a TCP connection if TCP connection is closed? Has anyone an example to debug?

@akyoto
Copy link
Contributor

akyoto commented Feb 13, 2025

As I understand, the tool (wrk) send http request after closing the connection ?

It doesn't close it after reading the final response message, and that's precisely the problem.

A client that sends a "close" connection option MUST NOT send further requests on that connection (after the one containing the "close") and MUST close the connection after reading the final response message corresponding to this request.

@trikko
Copy link
Contributor

trikko commented Feb 13, 2025

This thing could be a little misleading: if server responds with connection:close server itself should close connection. Server in this case is not required to send content-lenght, so client can't close the request itself since it probably can't even understand if the request is completed, and have to wait for close by server AFAIK!

Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of octets received prior to the server closing the connection.

That's a backward compatibility with http/1.0 where connection keep-alive didn't exist and connection was clos for each request

@cyrusmsk
Copy link
Contributor

cyrusmsk commented Apr 7, 2025

For the obvious reason - no one is interested in incorrect results.

which is not really true.. TechEmpower benchmark is a total junk - but people are still intersted :D

@kanarus
Copy link
Contributor

kanarus commented Apr 7, 2025

TechEmpower benchmark is a total junk - but people are still intersted

There’s some truth to that, but I think there is a confusion between two types of source of incorrectness :

  • performance differences caused by application layer
    • will be increased by complex ( close-to-real-world ) application requirements
    • if a benchmark requires a complex application, some frameworks may end up excessively optimizing their implementations just for the benchmark
  • accuracy of measurement
    • will be affected by the environment or process of benchmark itself, usually set up by benchmark's maintainers
    • in this regard, TechEmpower's benchmark seems reliable to some extent. in other words, they appears to, at least, accurately maps each submitted implementation to its reasonable result on their setup.
- application measurement
TechEmpower relatively complex reliable to some extent
This project very simple ?

From my understanding,

  • @KostyaTretyak is pointing out that the ? of measurement should be clarified and improved, and @waghanza agrees
  • @cyrusmsk's comment refers to the relatively complex application layer of TechEmpower's benchmark, which can make significant difference in results that isn't actually caused by frameworks themselves.

( is this right? )


In my opinion, one of the strong points of this project is the simplicity of its application layer which will minimize the noise from application implementation.
But this benefit is meaningful only when the measurement is reliable!

@waghanza
Copy link
Collaborator

waghanza commented Apr 8, 2025

As we know (at least it is documented), we have some performance gap for example with bun and not.

In my opinion, the first step to stabilizing results is to find the right tool / methodology to get closer to this.

When I have time, I will create a tool that output results with various options, perhaps, oha, bombardier and some others, and I will publish results here, as markdown so we canalyse together.

I'll still publish results, at least we use the same tooling as techempower 😛

@waghanza
Copy link
Collaborator

waghanza commented Apr 8, 2025

PS : We are still not in production since, this is a local docker ... techempower use real networks and real machines

@akyoto
Copy link
Contributor

akyoto commented Apr 8, 2025

In my opinion, the first step to stabilizing results is to find the right tool / methodology to get closer to this.

When I have time, I will create a tool that output results with various options, perhaps, oha, bombardier and some others, and I will publish results here, as markdown so we canalyse together.

This is a time waste1.
Please read this earlier comment.

Footnotes

  1. excluding the markdown results

@KostyaTretyak
Copy link
Contributor Author

PS : We are still not in production since, this is a local docker ... techempower use real networks and real machines

@waghanza, so, this raises a key question: is any other work being done on the computer where Docker is installed for the benchmarks? For example, are you watching a movie or (God forbid) playing a game while the benchmark results are being prepared? Please just answer honestly =). These three Ditsmod benchmark results strongly suggest it: 172,574 - 174,300 - 274,841.

@akyoto
Copy link
Contributor

akyoto commented Apr 8, 2025

@waghanza, so, this raises a key question: is any other work being done on the computer where Docker is installed for the benchmarks? For example, are you watching a movie or (God forbid) playing a game while the benchmark results are being prepared? Please just answer honestly =). These three Ditsmod benchmark results strongly suggest it: 172,574 - 174,300 - 274,841.

It's funny that we went full circle on this 😄

  1. Me suspecting noise in the results, Kostya telling me that it's not noise.
  2. Kostya suspecting noise in the results, me telling Kostya that it's not noise.

@KostyaTretyak if you take a look at the Go page on 2025-03-27 and 2025-03-26 and 2025-02-18 you can notice that they have reproducible bad results on specific (not random) frameworks.

So this can not be random noise from other tasks, otherwise multiple benchmarks would show the noise on different frameworks. But it's always the same frameworks, so this can't be random noise.

This really went full circle 😅

@KostyaTretyak
Copy link
Contributor Author

@akyoto , when something radically changes, like adding the Connection: close header, it drastically affects the results, so trying to spot "noise" there is pointless because it’s impossible. But now, when we have identical settings for the benchmarks from 2024-12, it’s possible to see it.

I don’t see what you’re saying about how supposedly only specific frameworks show deviations from 2024-12. In my opinion, there are both types of deviations: random and non-random. For example, the incorrect difference (Node vs. Bun) in the performance of frameworks like Express, Fastify, Koa... These are specific frameworks. And then there’s also random deviation, like in the case of Ditsmod, where two results show lower numbers, and the third result suddenly spikes by 60%.

@akyoto
Copy link
Contributor

akyoto commented Apr 8, 2025

@KostyaTretyak I see what you're saying but when everything about results from 2025 smells (keep-alive or not), then I would try to run the benchmarks on a different machine & OS or without potentially buggy virtualization software on a remote machine and see if the noise persists there.

I can try your ditsmod later on my system and post the results together with express to see if we get stable ratios there.

@waghanza Also can we please address the elephant in the room:
Most web servers (>95%) out there use Linux - why are we running the benchmarks on a Mac M1?

@cyrusmsk
Copy link
Contributor

cyrusmsk commented Apr 8, 2025

@KostyaTretyak I see what you're saying but when everything about results from 2025 smells (keep-alive or not), then I would try to run the benchmarks on a different machine & OS or without potentially buggy virtualization software on a remote machine and see if the noise persists there.

I can try your ditsmod later on my system and post the results together with express to see if we get stable ratios there.

@waghanza Also can we please address the elephant in the room: Most web servers (>95%) out there use Linux - why are we running the benchmarks on a Mac M1?

The whole project is open sourced
And anyone can run all frameworks on their machine and OS..

@akyoto
Copy link
Contributor

akyoto commented Apr 8, 2025

The whole project is open sourced And anyone can run all frameworks on their machine and OS..

I have done exactly that.
And the conclusion was that the results are far more realistic and stable on an actual Linux machine.
If the results are so much better, then why are we not using a real Linux machine?

@KostyaTretyak
Copy link
Contributor Author

I can try your ditsmod later on my system and post the results together with express to see if we get stable ratios there.

You can see the results on Techempower.

  1. Click the "visualize" link on any item in the list (except the last one, which might still be incomplete), and wait about 10 seconds for the results to load.
  2. Click the "Show filters panel" button, then go to "Language -> Disable all", then pick JavaScript & TypeScript, and at the bottom, press the "Apply changes" button.
  3. After each filter change, reload the page (this is a workaround for a filter bug).

It’s a pity that their "composite score" tab is gone. You could see all the results summarized there.

@akyoto
Copy link
Contributor

akyoto commented Apr 8, 2025

TechEmpower is a terrible source, I'd rather not look at a benchmark that allows cheating in plain sight, resulting in completely useless numbers for the end-user.

Thanks but I'll test it on my own.

@KostyaTretyak
Copy link
Contributor Author

@akyoto , consider the context in which I gave you the link to TechEmpower. You were going to check the stability of Ditsmod's performance. So you can even ignore the comparison with other frameworks, just look at the stability of the performance.

@waghanza
Copy link
Collaborator

waghanza commented Apr 8, 2025

@akyoto I'll read all comments carefully later, but

why are we running the benchmarks on a Mac M1?

is not true

sorry if I let you think that, but results run on m1 not mac but https://asahilinux.org/

@phphleb
Copy link
Contributor

phphleb commented Apr 8, 2025

When discussing the basis on which benchmarks are conducted, I can add that at the end of last year, I deployed a project locally and ran benchmarks on my computer. The instructions for launching are available in the readme. I compared about a dozen frameworks, and the percentage-based results between them were close to what was observed in the official tests.

@akyoto
Copy link
Contributor

akyoto commented Apr 8, 2025

@KostyaTretyak

js/ditsmod js/express
wrk1 876,729 188,550
oha2 70,496 61,184

Details3.

Would a bun version be helpful for comparison?

Footnotes

  1. clustered, keep-alive

  2. clustered, no keep-alive

  3. node v23.8.0 | Arch Linux | i5 13600k | forced constant CPU clock | customized sysctl net

@KostyaTretyak
Copy link
Contributor Author

KostyaTretyak commented Apr 8, 2025

@akyoto , I think oha is showing incorrect data. I have my own benchmarks where I use ab and wrk (well-known reliable utilities) with Connection: close header. Although there is no Ditsmod v3 yet (there is only v2).

@akyoto
Copy link
Contributor

akyoto commented Apr 8, 2025

Let's put oha aside for now.

Your point was that the current wrk results on 2025-04-07 published by @waghanza are wrong for your framework.
That was the basis of what made you ask him to remove your framework from the results.

But the results on my machine seem to indicate extremely similar ratios as the results on the website.
For me it looks like nothing is wrong there.

@KostyaTretyak
Copy link
Contributor Author

  • Your results: Ditsmod - 876,729, Express - 188,550, ratio 876,729 / 188,550 ~ 4,65
  • On the website results: Ditsmod - 172,574, Express - 83,400, ratio 172,574 / 83,400 ~ 2.01

Is this your "extremely similar ratios"?

@akyoto
Copy link
Contributor

akyoto commented Apr 8, 2025

@KostyaTretyak
Sorry the javascript filter on the site confused me, I was looking at a different one.

Yeah that does seem a little suspicious.

To be fair, there is also a big difference in CPU architectures here which could explain it.

Imagine this example: Your framework consumes just slightly more than the instr/data cache size of his CPU.
Data would have to be re-fetched and invalidated, causing cache misses all over the place.
On a different CPU where instruction and data caches just barely make it into the cache, it causes 0 cache misses.
This could easily lead to the differences shown here.

I'm not saying that this is what's happening, but it's one of the many examples how ratios on other CPUs can differ.

@cyrusmsk
Copy link
Contributor

cyrusmsk commented Apr 8, 2025

@akyoto , I think oha is showing incorrect data. I have my own benchmarks where I use ab and wrk (well-known reliable utilities) with Connection: close header. Although there is no Ditsmod v3 yet (there is only v2).

Both ab and wrk old and outdated solutions
With known issues and problems (many articles described them and other projects tried to fix them like wrk2 and rewrk)

And also on your own benchmark the ratio (7200/3500) ~ 2x close to what is published on the website…

@KostyaTretyak
Copy link
Contributor Author

And also on your own benchmark the ratio (7200/3500) ~ 2x close to what is published on the website…

But I was pointing out a different issue. About the ratio - that was a response, not a remark.

Besides that (I thought it was obvious), by wrk I mean the very latest version of that utility.

@waghanza
Copy link
Collaborator

waghanza commented Apr 8, 2025

I read carefully all post here.

The best idea is, imho, not compare with techempower or any other tools, but :

  • to use only / endpoint and 200 (simplification)
  • check with some tools at least (best to have 3, perhaps wrk / wrk2, vegetaandoha`)
  • we could have an approximate ratio for two couples oha and vegeta and wrk / wrk2
  • choose the best couple that could be close to what we can found as ratio over internet (but is supposed to be 4x faster that node ....)
  • check logs for deeper debug if required

@akyoto
Copy link
Contributor

akyoto commented Apr 8, 2025

My best guess is that it's the difference between x86-641 and arm642 that is at fault here.
Most people are used to benchmark results from Linux x86-64 systems, not Linux arm64 running on Apple Silicon3.

js/ditsmod vs js/express going from 4.6x to 2x ratio is one example.
go/web vs go/fasthttp goes from 1.1x to suddenly 5x ratio with keep-alive off (2.3x to 1.05x with keep-alive on).
There are probably more examples.

I would be very interested in seeing official benchmark results on a Linux x86-64 machine.

I think it would clear a lot of the confusion about "why" the results don't match with existing expectations.

Footnotes

  1. I use x86-64 as a synonym for amd64.

  2. I use arm64 as a synonym for aarch64.

  3. Asahi Linux is also a rather exotic distribution among Linux distros. Another source of confusion about the performance could be the 16k page size alignment since most people are still used to x86-64 on 4k page size.

@waghanza
Copy link
Collaborator

waghanza commented Apr 8, 2025

Sorry, forget that will run on both arches 😛

@akyoto
Copy link
Contributor

akyoto commented Apr 9, 2025

@waghanza

The best idea is, imho, not compare with techempower or any other tools, but :

* to use only / endpoint and 200 (simplification)

I have made a 5 minute video response, please take a look: MP4 | WEBM

12

Footnotes

  1. https://github.com/the-benchmarker/web-frameworks/issues/2007#issuecomment-557758131

  2. https://github.com/the-benchmarker/web-frameworks/issues/2007#issuecomment-557941296

@waghanza
Copy link
Collaborator

waghanza commented Apr 9, 2025

Thanks for your video @akyoto. We are pretty align with the main idea.

You are saying that we should test frameworks, not external libraries, this is the subject of #7737 and it is planned but a little bit complex to increment on it for now

But before, we need to stabilize results, imho

waghanza added a commit that referenced this issue Apr 28, 2025
* update

* update lihil and responder

* fix vixeny

* update
@waghanza
Copy link
Collaborator

@KostyaTretyak

I updated results on x86_64 -> https://web-frameworks-benchmark.netlify.app/result?f=express,nestjs-express,hyper-express,express-bun

Still working on something to compare tools / arch to help us find an accurate toolset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests