Skip to content

Instantly share code, notes, and snippets.

@toivoh
Created October 31, 2014 21:02
Show Gist options
  • Save toivoh/0b8537ae82fb66c83f1d to your computer and use it in GitHub Desktop.
Save toivoh/0b8537ae82fb66c83f1d to your computer and use it in GitHub Desktop.
Trying to get Julia to produce an unrolled loop with SIMD instructions in it
module TestSIMD
immutable TypeConst{T} end
function innerloop!{T, M}(dest::Vector{T}, dest_ofs, src::Vector{T}, src_ofs, ::TypeConst{M})
@simd for i=1:M
@inbounds dest[i + dest_ofs] $= src[i + src_ofs]
end
end
function showcode(T, n)
println()
@show T n
println("===========")
code_native(innerloop!, (Vector{T}, Int, Vector{T}, Int, TypeConst{n}))
end
showcode(Uint64, 8)
showcode(Uint64, 16)
end # module
@toivoh
Copy link
Author

toivoh commented Nov 3, 2014

I'm trying to get Julia to emit an inner loop that is unrolled and uses SIMD instructions, but it seems that I can't have both at the same time. (Trying to optimize https://gist.github.com/toivoh/c9a1f1e064396bdf3447)

With n = 8, I get unrolling but no SIMD instructions:

T = Uint64
n = 8
===========
    .text
Filename: /home/toivo/code/julia/xorshift/TestSIMD.jl
Source line: 23
    push    RBP
    mov RBP, RSP
Source line: 23
    mov RAX, QWORD PTR [RDX + 8]
    mov R8, QWORD PTR [RAX + 8*RCX]
    mov RDX, QWORD PTR [RDI + 8]
    xor QWORD PTR [RDX + 8*RSI], R8
    mov RDI, QWORD PTR [RAX + 8*RCX + 8]
    xor QWORD PTR [RDX + 8*RSI + 8], RDI
    mov RDI, QWORD PTR [RAX + 8*RCX + 16]
    xor QWORD PTR [RDX + 8*RSI + 16], RDI
    mov RDI, QWORD PTR [RAX + 8*RCX + 24]
    xor QWORD PTR [RDX + 8*RSI + 24], RDI
    mov RDI, QWORD PTR [RAX + 8*RCX + 32]
    xor QWORD PTR [RDX + 8*RSI + 32], RDI
    mov RDI, QWORD PTR [RAX + 8*RCX + 40]
    xor QWORD PTR [RDX + 8*RSI + 40], RDI
    mov RDI, QWORD PTR [RAX + 8*RCX + 48]
    xor QWORD PTR [RDX + 8*RSI + 48], RDI
    mov RAX, QWORD PTR [RAX + 8*RCX + 56]
    xor QWORD PTR [RDX + 8*RSI + 56], RAX
Source line: 64
    pop RBP
    ret

With n = 16, I get SIMD instructions but no unrolling:

T = Uint64
n = 16
===========
    .text
Filename: /home/toivo/code/julia/xorshift/TestSIMD.jl
Source line: 64
    push    RBP
    mov RBP, RSP
    shl RCX, 3
    add RCX, QWORD PTR [RDX + 8]
    shl RSI, 3
    add RSI, QWORD PTR [RDI + 8]
    mov EAX, 16
    movups  XMM0, XMMWORD PTR [RSI]
    movups  XMM1, XMMWORD PTR [RCX]
    xorps   XMM1, XMM0
    movups  XMMWORD PTR [RSI], XMM1
    add RCX, 16
    add RSI, 16
    add RAX, -2
    jne -30
Source line: 64
    pop RBP
    ret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment