Adapt commit 982b596e for the arm and aarch64 NEON asm. 5-10% faster on Cortex-A9.
Attach a file by drag & drop or click to upload