-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use FFHT code #5
Comments
I'll compare that with my SSE2 SIMD code, thanks for the link. There are a bunch of things you can do with the WHT that are not well known. Mainly stemming from fast random projection algorithms based on the WHT: |
When I ran the the FFHT bench-marking code on my cheap laptop I got 5200 65536-point WHT's per second per core. In my code I get 5000 per core. My legacy SSE code: https://drive.google.com/open?id=1bhQi4BybXPkvCENIYQcdw9PHAwS9Ocam I see all the links I have given are broken if you just click on them. It seems you have to copy and paste them to the browser address line for them to work! Anyway to get GPU comparable results you would definitely need say one of the new AMD 32 core CPUs. From the FFHT result tables that would get you to around 1,000,000 65536-point WHT's per second. A big advantage of CPUs is they are so much easier to program than GPUs. I don't know what the situation is with the new GPU tensor cores. If you could use them for the WHT you might reach 10 million+ 65536-point WHTs per second presuming the memory bandwidth could handle that. That would be 1000 times better than I can do on my 2 core laptop (no harddrive, broken keyboard, CPU heatsink problems, runs off cheap usb stick etc.) I could evolve a neural network in 86.4 seconds that takes me a day at present. |
I did a quick benchmark for the 1D case, it's probably not perfect but suggests that FFHT is one order of magnitude faster, even a bit more in low dimension. Here are the results (Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz). I create each time a matrix |
I would continue to use FFTW (because it probably handles more cases), and add FFHT to use it for the cases it supports. Would need to first make a BinaryBuilder package of FFHT. |
The https://github.com/FALCONN-LIB/FFHT repository implements a specialized SIMD FHT, which it says is faster than FFTW for this problem. (Totally possible, since 2×2×2×⋯ multidimensional transforms are not that optimized in FFTW — they are using size-2 DFT kernels, whereas to be fast they should really use larger multidimensional kernels.) Might be interesting to support using FFHT (which is MIT-licensed) in cases where it is applicable.
The text was updated successfully, but these errors were encountered: