Skip to content

Instantly share code, notes, and snippets.

@acdimalev
Last active June 25, 2020 06:56
Show Gist options
  • Save acdimalev/6b490f821bd08e7f20c6f1ccca860561 to your computer and use it in GitHub Desktop.
Save acdimalev/6b490f821bd08e7f20c6f1ccca860561 to your computer and use it in GitHub Desktop.

instead of this...

vmul.f32  q10, q15, q9
vmul.f32  q11, q13, q9
vfma.f32  q10, q3, q8
vfma.f32  q11, q14, q8

it's doing this...

vmul.f32  q12, q15, q9
vfma.f32  q12, q3, q8
vorr  q10, q12, q12
vmul.f32  q12, q13, q9
vfma.f32  q12, q14, q8
vorr  q11, q12, q12

this appears to be the result of vectorizable_store

https://github.com/gcc-mirror/gcc/blob/releases/gcc-9.3.0/gcc/tree-vect-stmts.c#L6328-L6330

on targets that load and store lanes of data

https://github.com/gcc-mirror/gcc/blob/releases/gcc-9.3.0/gcc/tree-vect-stmts.c#L7212

immediately marking a register as clobbered before assignment to it

https://github.com/gcc-mirror/gcc/blob/releases/gcc-9.3.0/gcc/tree-vect-stmts.c#L7222

which results in the compiler refusing to coalesce the registers, thus the computation being performed in a separate register for no good reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment