The Intel optimization manual has a fun example where they use vpconflict for ve...

The Intel optimization manual has a fun example where they use vpconflict for vectorizing sparse dot products: https://github.com/intel/optimization-manual/blob/main/chap1...

I benchmarked it on Intel, and it was indeed quite fast/a good improvement over the scalar version. Will be interesting to try that on AMD.