-
Notifications
You must be signed in to change notification settings - Fork 697
The benchmark result from 2025-02-10 is definitely incorrect #8229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you check the implementation then ? Maybe a cluster thing then |
Many frameworks changed significantly |
@waghanza, I can't tell you what's wrong with the workflow, but something is definitely not going as it should. This is easy to check: cd javascript/ditsmod
npm install
npm run build
NODE_APP=dist/main.js node cluster.mjs From second terminal run 3-5 times this command: wrk -H 'Connection: close' -d 5s -c 8 --timeout 8 -t 4 http://0.0.0.0:3000 The outputs:
Stop ditsmod in first terminal and then run this commands: cd ../ditsmod-bun
bun install
bun run build
bun run start From second terminal run 3-5 times this command: wrk -H 'Connection: close' -d 5s -c 8 --timeout 8 -t 4 http://0.0.0.0:3000 The last command outputs:
So, ditsmod runs x2 as fast on Bun.js as it does on Node.js, but definitely not x30 as shown in the benchmark results from 2025-02-10. |
Could you make the same for all tree endpoints? Maybe results are totally wrong for one of them |
It doesn't make sense for everyone to do this, but for example for Koa:
|
The numbers are definitely wrong. I have no clue why my own library is the only one showing the same results. I'm just guessing, but maybe the machine got occupied with other tasks during the benchmark and a re-run could fix it? |
No, the current results should be different from 2024-12 because the HTTP keep alive feature is not currently used. But still, the benchmarks are definitely incorrect. |
The whole command is wrk -H 'Connection: close' --connections 64 --threads 8 --duration 15 --timeout 1 --script /home/waghanza/web-frameworks/pipeline.lua http://172.17.0.2:3000/ The full log is here =>https://gist.github.com/waghanza/92366d18948972463a918f3e3ece2be7 Seems that there is a lot of socket errors in favor of bun |
@waghanza, regarding bun errors, I don't know if the benchmark results can be trusted. Other frameworks also throw these errors when running on bun. But these logs do not contain logs from the framework. Why? Are they specifically disabled? These logs should look like:
|
So are we sure that we do not have server implementations that just ignore the connection close instruction as it is not common anymore? 🤔 To be honest it now feels more like a TCP handshake benchmark than a web server benchmark (or maybe more a HTTP/1.0 one). As long as I understood the new changes correctly. |
The server receives not just the instruction, but also the HTTP version. If it does not follow the instruction to close the connection, then in the same way it can ignore any instructions at its discretion. Can compliance with these instructions be easily verified?
Single Page Applications (which are very popular now) often use two web servers: one for static files, one for dynamic requests. Benchmarks are usually run for frameworks with dynamic requests, and probably in 90% of cases, such a feature as "keep alive" is not used (in particular because of use a single GraphQL query). So, in the real world, dynamic queries are handled by closing the connection after a few requests (usually 1-3 requests). In my opinion, in real-world conditions, dynamic queries never get thousands of requests on a single connection, as has been the case in benchmarks so far. |
You could do If you look at the results for Java it is pretty obvious that the first two frameworks are not HTTP compliant. See RFC 9112 section 9.6.
I agree, both ends of this range feel kind of synthetic. |
Many of these things were described just in previous Issue: #8116 Andrea even prepared a simple validation toolkit |
But why would somebody send "Connection: close" and immediately follow up with another request on the same connection? I believe that's what's happening here in the benchmarks and this combination doesn't seem to make sense. The RFC also states this:
^ The benchmark seems to violate this rule. |
These rules should work for real-world applications. But in the real world, no web server receives thousands of requests per second on a single connection. Closing the connection after each request brings the benchmarks much closer to real-world usage than allowing thousands of requests to be made on a single connection. |
Then the benchmark should actually discard the connection and not re-use it as it obviously happens. |
I think you misunderstood what I was trying to say. I'm saying the client doesn't follow the RFC here and needs to be fixed. |
@akyoto, why are you sure that the client is trying to send requests to an already closed connection? Why can't subsequent requests open a new connection? |
It's based on the numbers I see. My reverse-proxy targeted web server ignores "Connection: close" and is the only Go library that keeps getting the same results as in 2024-12 (I might implement in the future, or not, but that's off-topic). So, logically speaking, the client must be sending the same number of requests as in 2024. But it shouldn't, because it sent a "Connection: close". The client has an obligation to stop sending requests after that. |
This is a problem on your web server's side. It's clearly not working according to the HTTP rules you're urging us to follow.
You gave an example of how your web server works and concluded that since your web server ignores this instruction, then there is no need to send these instructions. But, in fact, this is not logical at all. |
Whether that's a problem or not is irrelevant and off-topic here.
No @KostyaTretyak that's not what I'm saying at all. It's not necessary to stop sending "Connection: close". You have 2 solutions here: a) Send "Connection: close" and stop sending requests after that. You probably misunderstood my post as a call to use b) as a solution, but that's not my point at all. |
But why? Above you explained that you want the client to not send requests to an already closed connection. But you still didn't answer my question:
|
Well, because you want to test web server frameworks and not bananas, I suppose?
|
The golden rule of network communications is to never trust the other side. Either way, I don't think people understand that I'm arguing against my own benefit here. I mean, if you want to keep sending requests on a connection with this header on it, be my guest. So, do you want to improve the accuracy of the benchmark, or not? Because I can guarantee you, there will be many more servers ignoring the header. |
Thanks for you insights, and glad there is a debate here 🎉 Not sure to understand
The whole idea of this project is to test framework in real world scanario :
|
@waghanza The problem is that we are talking about 2 different things here. You talk about the testing methodology. Nobody here has a problem with your testing methodology. The problem is the testing implementation itself which is incorrect as your client does not adhere to common principles like stopping to send requests after it tells the server to close the connection. It is mentioned explicitly in the specs that the client should not do that and because of this buggy implementation you get a lot of incorrect outcomes. Even if we disregard the specs for a second and just think about this, why would you tell everybody that the highway is going to be closed and then attempt to drive over said highway? You should not drive over the highway if it's going to be closed. The correct thing to do here is to find a new one. methodology != implementation |
Understood. I do not take it as critism but a way to improe the whole project As I understand, the tool (wrk) send http request after closing the connection ? |
How can it send data on a TCP connection if TCP connection is closed? Has anyone an example to debug? |
It doesn't close it after reading the final response message, and that's precisely the problem.
|
This thing could be a little misleading: if server responds with connection:close server itself should close connection. Server in this case is not required to send content-lenght, so client can't close the request itself since it probably can't even understand if the request is completed, and have to wait for close by server AFAIK!
That's a backward compatibility with http/1.0 where connection keep-alive didn't exist and connection was clos for each request |
which is not really true.. TechEmpower benchmark is a total junk - but people are still intersted :D |
There’s some truth to that, but I think there is a confusion between two types of source of incorrectness :
From my understanding,
( is this right? ) In my opinion, one of the strong points of this project is the simplicity of its application layer which will minimize the noise from application implementation. |
As we know (at least it is documented), we have some performance gap for example with bun and not. In my opinion, the first step to stabilizing results is to find the right tool / methodology to get closer to this. When I have time, I will create a tool that output results with various options, perhaps, oha, bombardier and some others, and I will publish results here, as markdown so we canalyse together. I'll still publish results, at least we use the same tooling as techempower 😛 |
PS : We are still not in production since, this is a local docker ... techempower use real networks and real machines |
This is a time waste1. Footnotes
|
@waghanza, so, this raises a key question: is any other work being done on the computer where Docker is installed for the benchmarks? For example, are you watching a movie or (God forbid) playing a game while the benchmark results are being prepared? Please just answer honestly =). These three Ditsmod benchmark results strongly suggest it: 172,574 - 174,300 - 274,841. |
It's funny that we went full circle on this 😄
@KostyaTretyak if you take a look at the Go page on 2025-03-27 and 2025-03-26 and 2025-02-18 you can notice that they have reproducible bad results on specific (not random) frameworks. So this can not be random noise from other tasks, otherwise multiple benchmarks would show the noise on different frameworks. But it's always the same frameworks, so this can't be random noise. This really went full circle 😅 |
@akyoto , when something radically changes, like adding the I don’t see what you’re saying about how supposedly only specific frameworks show deviations from 2024-12. In my opinion, there are both types of deviations: random and non-random. For example, the incorrect difference (Node vs. Bun) in the performance of frameworks like Express, Fastify, Koa... These are specific frameworks. And then there’s also random deviation, like in the case of Ditsmod, where two results show lower numbers, and the third result suddenly spikes by 60%. |
@KostyaTretyak I see what you're saying but when everything about results from 2025 smells (keep-alive or not), then I would try to run the benchmarks on a different machine & OS or without potentially buggy virtualization software on a remote machine and see if the noise persists there. I can try your ditsmod later on my system and post the results together with express to see if we get stable ratios there. @waghanza Also can we please address the elephant in the room: |
The whole project is open sourced |
I have done exactly that. |
You can see the results on Techempower.
It’s a pity that their "composite score" tab is gone. You could see all the results summarized there. |
TechEmpower is a terrible source, I'd rather not look at a benchmark that allows cheating in plain sight, resulting in completely useless numbers for the end-user. Thanks but I'll test it on my own. |
@akyoto , consider the context in which I gave you the link to TechEmpower. You were going to check the stability of Ditsmod's performance. So you can even ignore the comparison with other frameworks, just look at the stability of the performance. |
@akyoto I'll read all comments carefully later, but
is not true sorry if I let you think that, but results run on m1 not mac but https://asahilinux.org/ |
When discussing the basis on which benchmarks are conducted, I can add that at the end of last year, I deployed a project locally and ran benchmarks on my computer. The instructions for launching are available in the readme. I compared about a dozen frameworks, and the percentage-based results between them were close to what was observed in the official tests. |
@akyoto , I think |
Let's put Your point was that the current But the results on my machine seem to indicate extremely similar ratios as the results on the website. |
Is this your "extremely similar ratios"? |
@KostyaTretyak Yeah that does seem a little suspicious. To be fair, there is also a big difference in CPU architectures here which could explain it. Imagine this example: Your framework consumes just slightly more than the instr/data cache size of his CPU. I'm not saying that this is what's happening, but it's one of the many examples how ratios on other CPUs can differ. |
Both And also on your own benchmark the ratio (7200/3500) ~ 2x close to what is published on the website… |
But I was pointing out a different issue. About the ratio - that was a response, not a remark. Besides that (I thought it was obvious), by |
I read carefully all post here. The best idea is, imho, not compare with techempower or any other tools, but :
|
My best guess is that it's the difference between x86-641 and arm642 that is at fault here.
I would be very interested in seeing official benchmark results on a Linux x86-64 machine. I think it would clear a lot of the confusion about "why" the results don't match with existing expectations. Footnotes |
Sorry, forget that will run on both arches 😛 |
I updated results on x86_64 -> https://web-frameworks-benchmark.netlify.app/result?f=express,nestjs-express,hyper-express,express-bun Still working on something to compare tools / arch to help us find an accurate toolset |
The first inaccuracy, which is immediately noticeable, is the abnormally strong difference between benchmarks of the same frameworks, but on node.js and bun.js. For example, ditsmod on node.js shows 3.5 K per second, and on bun.js - 99 K. In fact, ditsmod v3 on bun.js works 1.5 - 2 times faster, but definitely not 30 times faster.
Even if we take the results for node.js alone, the benchmarks are still clearly incorrect. For example, nestjs-express shows even better results than bare express, which is impossible if both these frameworks are running on the same version of node.js. Correct results should show that express is always 15-20% faster than nestjs-express.
In addition, express should show more than half the performance of ditsmod or fastify.
The text was updated successfully, but these errors were encountered: