Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use FFHT code #5

Open
stevengj opened this issue Jun 21, 2018 · 4 comments
Open

use FFHT code #5

stevengj opened this issue Jun 21, 2018 · 4 comments

Comments

@stevengj
Copy link
Member

stevengj commented Jun 21, 2018

The https://github.com/FALCONN-LIB/FFHT repository implements a specialized SIMD FHT, which it says is faster than FFTW for this problem. (Totally possible, since 2×2×2×⋯ multidimensional transforms are not that optimized in FFTW — they are using size-2 DFT kernels, whereas to be fast they should really use larger multidimensional kernels.) Might be interesting to support using FFHT (which is MIT-licensed) in cases where it is applicable.

@ghost
Copy link

ghost commented Aug 8, 2018

I'll compare that with my SSE2 SIMD code, thanks for the link. There are a bunch of things you can do with the WHT that are not well known. Mainly stemming from fast random projection algorithms based on the WHT:
https://github.com/S6Regen?tab=repositories
It also seems the very simple out of place WHT is hardly known, where you go through the input vector pair-wise sequentially and put the sum of each pair sequentially in the lower half of a new array, and the difference sequentially in the upper half of the new array. Repeat for a total log base 2 n times. That is the simplest algorithm in computer science least known.
http://randomprojectionai.blogspot.com/

@ghost
Copy link

ghost commented Aug 10, 2018

When I ran the the FFHT bench-marking code on my cheap laptop I got 5200 65536-point WHT's per second per core. In my code I get 5000 per core.

My legacy SSE code: https://drive.google.com/open?id=1bhQi4BybXPkvCENIYQcdw9PHAwS9Ocam

I see all the links I have given are broken if you just click on them. It seems you have to copy and paste them to the browser address line for them to work!

Anyway to get GPU comparable results you would definitely need say one of the new AMD 32 core CPUs. From the FFHT result tables that would get you to around 1,000,000 65536-point WHT's per second. A big advantage of CPUs is they are so much easier to program than GPUs. I don't know what the situation is with the new GPU tensor cores. If you could use them for the WHT you might reach 10 million+ 65536-point WHTs per second presuming the memory bandwidth could handle that. That would be 1000 times better than I can do on my 2 core laptop (no harddrive, broken keyboard, CPU heatsink problems, runs off cheap usb stick etc.) I could evolve a neural network in 86.4 seconds that takes me a day at present.

@Djoop
Copy link

Djoop commented Nov 23, 2018

I did a quick benchmark for the 1D case, it's probably not perfect but suggests that FFHT is one order of magnitude faster, even a bit more in low dimension. Here are the results (Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz).

results

I create each time a matrix A=randn(d,1000), and compare @belapsed fwht_natural!($(A),dims=1) and running fht_double on each column of A (with a simple loop, one can probably parallelize it.). They also provide an out-of-place version, which I guess is better than first copying.
For reference, there is also the Spiral version (http://www.spiral.net/software/wht.html), and this other SIMD C++ implementation : https://github.com/curto2/mckernel/ , but I didn't have time to compare.
So I don't know what is the best option : using FFHT only, using FFHT+FFTW for unsuported cases (maybe multidimensional cases?), or even creating a different package for FFHT?

@stevengj
Copy link
Member Author

stevengj commented Nov 24, 2018

I would continue to use FFTW (because it probably handles more cases), and add FFHT to use it for the cases it supports. Would need to first make a BinaryBuilder package of FFHT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants