The x86 version processes 16 floats per iteration, so len must be a multiple of 16.
Attach a file by drag & drop or click to upload