I benchmarked it on Intel, and it was indeed quite fast/a good improvement over the scalar version. Will be interesting to try that on AMD.
I benchmarked it on Intel, and it was indeed quite fast/a good improvement over the scalar version. Will be interesting to try that on AMD.