Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasmparser: Should we put 128-bit SIMD behind a crate feature? #1899

Open
Robbepop opened this issue Nov 9, 2024 · 8 comments · May be fixed by #1903
Open

wasmparser: Should we put 128-bit SIMD behind a crate feature? #1899

Robbepop opened this issue Nov 9, 2024 · 8 comments · May be fixed by #1903

Comments

@Robbepop
Copy link
Contributor

Robbepop commented Nov 9, 2024

With the recent movement to put some major Wasm proposals behind crate features, e.g. component-model and gc, I was wondering if this was also practical or useful for the very large 128-bit simd (and thus also relaxed-simd) proposal with its tons of op-codes. (simd: 235, flexible-simd: 19)

I see plenty of Wasm runtimes not (yet) supporting simd and it isn't even clear to me whether simd is supposed to become an optional component of Wasm in the future due to its massive size and the outlook of the more elaborate flexbile-vectors on the (far) horizont. For Wasmi I am starting to lean towards supporting Wasm simd proposal soonish but behind a crate feature that is disabled by default since it is just way too massive.

The hope is that putting the simd and flexible-simd feature behind a single simd crate feature for wasmparser would improve compile times, artifact sizes and maybe (or hopefully) also parse and validation performance since certain checks can be performed at compile time instead.

Ideas, thoughts and discussion welcome. :)

@alexcrichton
Copy link
Member

One point perhaps worth clarifying is that the gc feature is for Wasmtime, not wasm-tools. I bring this up because while wasmparser has a component-model feature one of the main things that made it possible is that it's purely an optional layer on top of core wasm. That made it quite easy to have structs "just disappear" from the API for component-model support and from an end-user perspective it's easy to work with the crate both with/without component-model support.

This is a bit different for core wasm proposals because everything is pervasively present throughout the wasm type system and structures exposed by this crate. For example the Visitor trait and Operator enum would be entirely different with-and-without the gc or simd features. To me this would be a major sticking point of adding a feature gate here because the common workarounds of default trait methods or #[non_exhaustive] I don't think are great here because it removes the exhaustiveness checking done by rustc.

One possible option perhaps is something like:

trait Visitor {
    fn simd_visitor(&mut self) -> Option<&mut dyn SimdVisitor> { None }
}

trait SimdVisitor {
    // ... all simd methods 
}

for the trait side of things and maybe enum Operator { Simd(SimdOperator), ... } for Operator, but there's still a fair bit of design work to do there to determine whether that'd work.


Overall though I'd think the high-order bit here is the runtime performance. Compile time and binary size are nice but IMO runtime performance is the most important. Do you perhaps have a benchmark showing that if wasmparser (probably locally edited) removes simd support that it's measurably faster? If so that'd make this more urgent I think but I'd also naively think that removing simd wouldn't have much of a performance impact. The gc proposal is different in that it affects the type system, but I think we can improve perf there without going the route of compile time features perhaps.

@alexcrichton
Copy link
Member

Another option perhaps is to more radically redesign things. Right now WasmFeatures is a bitflags type but if it were instead changed to a trait with methods-returning-bool then that could be statically monomorphized and constant-folded if methods return false. The main problem with this is that it would be quite a lot of type parameters everywhere for what (I think) is relativley niche benefit.

Brainstorming more yet-another possible option here could be controlling defaults for wasm features via Cargo features. Right now if you disable the features Cargo feature of wasmparser it'll const-propagate the default value for all wasm features, but this is too expansive for the use case you're interested in (e.g. it includes gc by default since it's stage4+). That could in theory be controlled with Cargo features as well such as features = ['default-off-gc'] or something like that. That pushes much of the issue to compile time but might be a viable way to "have a trait" without actually plumbing a trait everywhere.

@Robbepop
Copy link
Contributor Author

Robbepop commented Nov 10, 2024

@alexcrichton thank you for the fast response!

To be honest, I am mostly interested in compilation and artifact size improvements since wasmparser is by far Wasmi's largest dependency. In contrast with Wasmtime, wasmparser's compile time and artifact size probably is pretty much negligible, so I can understand the different motivations.

I too doubt that this would improve runtime performance significantly. But if runtime performance is the main motivator for this PR from the viewpoint of BytecodeAlliance maintainers a performance analysis should be done. I think that mostly the parsing and validation logic for all these numerous SIMD operators could be a huge relieve on the Rust compiler and for artifact sizes.

You are right that this change is probably more intrusive than the component-model crate feature and I was also thinking about how badly this would get. Probably, the only way of finding out is implementing it .. :S

The idea about the WasmFeatures trait is interesting to me. Maybe it could actually lead to a more flexible codebase.
Having to put #[non_exhaustive] on the giantic Op enum is indeed not ideal (as usual) but also, since it is so huge I don't think it would have a major impact on users.

Outsourcing all SIMD operators into their own enum is also an interesting idea, I wonder how user friendly this would be in the end.


Maybe, if we combine the enum SimdOp with the trait VisitSimd we could get far with this.

@alexcrichton
Copy link
Member

Ah ok makes sense! Sorry I don't mean to belittle compile-time or binary size improvements at all. I personally prioritize runtime which is why I originally focused on that, but I agree it's important to focus on the other bits too!

Would you be interested in deleting all the simd bits and seeing if it has a measurable difference on compile-time or binary size? I think it could help in both situations but probably in the single-digit-percentage range would be my prediction.

I do agree that #[non_exhaustive enum Operator probably isn't so bad given its current size. It'd be unfortunate in Wasmtime for example we'd no longer have exhaustiveness checking but I also don't think that's the end of the world either. In that sense if something seems reasonable for the trait-based approach I'd be happy to review that! That should be pretty extensible to the gc proposal for example or that family of proposals at least which I think would be useful as well.

@Robbepop
Copy link
Contributor Author

Robbepop commented Nov 11, 2024

Thank you for your understanding. Yes, I will remove all SIMD related code as a test and update you here about the changes in compile time, runtime and artifact size. Best outcome is if nothing changes, so there was nothing to do. :)

@Robbepop
Copy link
Contributor Author

Robbepop commented Nov 11, 2024

@alexcrichton I conducted the experiment here: https://github.com/Robbepop/wasm-tools/tree/rf-remove-simd-related-code

Feel free to check if I correctly stripped all the SIMD related code.

Results (MacBook M2 Pro)

Default Features

cargo clean && time cargo build -p wasmparser
Compile Time Artifact Size (.rlib)
main (debug) 5.56s 33MB
PR (debug) 4.94s 31MB
main (release) 6.30s 13MB
PR (release) 5.70s 11MB
main (production) 9.34s 14MB
PR (production) 9.05s 12MB
main (production-z) 8.93s 11MB
PR (production-z) 8.03s 9.1MB

Embedded Features

cargo clean && time cargo build -p wasmparser --no-default-features --features std,validate,features
Compile Time Artifact Size (.rlib)
main (debug) 3.43s 22MB
PR (debug) 2.98s 19MB
main (release) 4.28s 10MB
PR (release) 3.61s 8.4MB
main (production) 5.22s 11MB
PR (production) 4.50s 9.3MB
main (production-z) 4.93s 9.1MB
PR (production-z) 4.34s 7.3MB

Conclusion

Fewer improvements than I hoped, but still significant.

Compile times decreased by roughly 10-15%.
Artifact size decreased by roughly 15-20%.

Profiles Used

[profile.production]
inherits = "release"
lto = "fat"
codegen-units = 1

[profile.production-z]
inherits = "release"
opt-level = "z"
codegen-units = 1

@alexcrichton
Copy link
Member

Wow yeah that's a pretty hefty chunk and honestly pretty surprising to me. I'd expect a much smaller impact and those seem definitely worth it time/space-wise.

That's compelling enough to me that I think it'd be worth exploring this more and seeing how it pans out, would you be interested in doing so?

@Robbepop
Copy link
Contributor Author

Okay this sounds motivating. Yes, I will work on this.

@Robbepop Robbepop linked a pull request Nov 12, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants