Created
April 16, 2015 09:45
-
-
Save simonbyrne/2a7a3bdefeb75d2d60ac to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Inside Julia\n", | |
"\n", | |
"### Simon Byrne\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"# About Me\n", | |
"\n", | |
"* I'm a researcher in statistics at University College London.\n", | |
"* Focus on a class of algorithms known as Markov chain Monte Carlo (MCMC).\n", | |
" * Largely theoretical analysis of computational techniques.\n", | |
" * but occasionally I need to write some code.\n", | |
" * Sequential algorithms: can't be vectorised\n", | |
"\n", | |
"# Why Julia?\n", | |
"\n", | |
"* Quick to write: easy to try out ideas.\n", | |
"* Reasonably performant, but representative of the speed of the algorithm.\n", | |
"* Easy to understand: if I need to look at the code 6 months later.\n", | |
"* Able to peek under the hood.\n", | |
"\n", | |
"# How I met Julia\n", | |
"\n", | |
"> Just to never write another .mex file, I'll very seriously consider Julia for new projects\n", | |
"> \n", | |
"> – <cite>Justin Domke (September 2012)</cite>\n", | |
"\n", | |
"I had Matlab code full of:\n", | |
"```\n", | |
"R = chol(A) \n", | |
"y = R \\ ( R' \\ x)\n", | |
"```\n", | |
"\n", | |
"I really liked being able to write Julia code:\n", | |
"```\n", | |
"C = cholfact(A) \n", | |
"y = C \\ x\n", | |
"```\n", | |
"(and it was faster).\n", | |
"\n", | |
"I filed my first pull request 3 weeks later." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# My Contributions\n", | |
"\n", | |
"* Base Julia: mostly numeric related\n", | |
" * Numeric comparisons\n", | |
" * Linear algebra\n", | |
" * Various mathematical functions\n", | |
" \n", | |
"* Stats\n", | |
" * StatsBase.jl: basic statistics functionality\n", | |
" * Distributions.jl: probability distributions\n", | |
" * KernelDensity.jl: kernel density estimators\n", | |
"* Other\n", | |
" * AppleAccelerate.jl: an interface to Apple's Accelerate library\n", | |
" * InplaceOps.jl: avoids unnecessary allocation of arrays\n", | |
" * DoubleDouble.jl: extended precision arithmetic without BigFloats" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Numeric comparisons\n", | |
"or, why does:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"true" | |
] | |
}, | |
"execution_count": 24, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"0.1 > 1//10" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0.100000000000000005551115123125782702118158340454101562500000" | |
] | |
} | |
], | |
"source": [ | |
"@printf \"%.60f\" 0.1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.30000000000000004" | |
] | |
}, | |
"execution_count": 26, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"0.1+0.2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Few other languages let you compare numbers of different types.\n", | |
"\n", | |
"The rough idea:\n", | |
"* every non-NaN `Real` value corresponds to exactly 1 real number, or an infinity.\n", | |
"* all non-NaN `Real` values can be compared correctly." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": false | |
}, | |
"source": [ | |
"## Float vs Integer comparisons" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"function <(x::Float64, y::Int64)\n", | |
" fy = Float64(y)\n", | |
" (x < fy) | ((x == fy) & (fy < Float64(typemax(Int64))) & (unsafe_trunc(Int64,fy) < y))\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Why is this so complicated?\n", | |
"\n", | |
"Neither type is a subset of the other.\n", | |
"\n", | |
"* `Float64` can only represent all integers up to 2<sup>53</sup>.\n", | |
"* Any integers above this may be rounded when converted." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"9007199254740993" | |
] | |
}, | |
"execution_count": 27, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"2^53+1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"9007199254740992" | |
] | |
} | |
], | |
"source": [ | |
"@printf \"%i\" Float64(2^53+1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"So the rough idea is:\n", | |
"1. Convert to `Float64`, and compare the values\n", | |
"2. If they are not equal, then the comparison is correct.\n", | |
"3. If they are equal, then `y` must be an integer, so can be compared as integers\n", | |
" * `unsafe_trunc` is fast: it doesn't throw exceptions (same as a C cast).\n", | |
"4. Extra check is for integers that can't be round-tripped (ones that get rounded to 2<sup>63</sup>)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"9223372036854775807" | |
] | |
}, | |
"execution_count": 30, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"y = typemax(Int64) # 2^63-1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"9223372036854775808" | |
] | |
} | |
], | |
"source": [ | |
"fy = Float64(y)\n", | |
"@printf \"%i\" fy # rounded to 2^64" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"ename": "LoadError", | |
"evalue": "LoadError: InexactError()\nwhile loading In[32], in expression starting on line 1", | |
"output_type": "error", | |
"traceback": [ | |
"LoadError: InexactError()\nwhile loading In[32], in expression starting on line 1", | |
"", | |
" in call at base.jl:36" | |
] | |
} | |
], | |
"source": [ | |
"Int64(fy)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"-9223372036854775808" | |
] | |
}, | |
"execution_count": 33, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"unsafe_trunc(Int64,fy)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## So what happens to your functions?\n", | |
"\n", | |
"You can see the underlying Julia code using `@less`. But what happens after?\n", | |
"\n", | |
"### 0. Parsing\n", | |
"* Syntax is parsed (`parse` function): you can also see what this looks like by using `quote`, and playing around with the `Expr` object.\n", | |
"* Macros are expanded: try out `macroexpand` on a quoted macro.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
":(1.0 < 2)" | |
] | |
}, | |
"execution_count": 34, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"e = parse(\"1.0 < 2\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
":comparison" | |
] | |
}, | |
"execution_count": 35, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"e.head" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3-element Array{Any,1}:\n", | |
" 1.0\n", | |
" :<\n", | |
" 2 " | |
] | |
}, | |
"execution_count": 36, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"e.args" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 37, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"quote \n", | |
" #308#out = Base.Printf.STDOUT\n", | |
" #309###x#2686 = 0.1\n", | |
" local #314#neg, #313#pt, #312#len, #307#exp, #310#do_out, #311#args\n", | |
" if Base.Printf.isfinite(#309###x#2686)\n", | |
" (#310#do_out,#311#args) = Base.Printf.fix_dec(#308#out,#309###x#2686,\"\",0,60,'f')\n", | |
" if #310#do_out\n", | |
" (#312#len,#313#pt,#314#neg) = #311#args\n", | |
" #314#neg && Base.Printf.write(#308#out,'-')\n", | |
" Base.Printf.print_fixed(#308#out,60,#313#pt,#312#len)\n", | |
" end\n", | |
" else \n", | |
" Base.Printf.write(#308#out,begin # printf.jl, line 143:\n", | |
" if Base.Printf.isnan(#309###x#2686)\n", | |
" \"NaN\"\n", | |
" else \n", | |
" if (#309###x#2686 Base.Printf.< 0)\n", | |
" \"-Inf\"\n", | |
" else \n", | |
" \"Inf\"\n", | |
" end\n", | |
" end\n", | |
" end)\n", | |
" end\n", | |
" Base.Printf.nothing\n", | |
"end" | |
] | |
}, | |
"execution_count": 37, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"macroexpand(:(@printf \"%.60f\" 0.1))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 1. `@code_lowered` \n", | |
"This is a proto-Julia syntax: a couple of things have been done:\n", | |
"* Syntax desugaring, e.g. \n", | |
" * expand `a+=b` to `a=a+b`\n", | |
" * substitute `a'*b` with `Ac_mul_B(a,b)`\n", | |
"* Control flow are replaced by simple \"goto\"-style statements.\n", | |
"* Resolve scope of functions" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 38, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1-element Array{Any,1}:\n", | |
" :($(Expr(:lambda, Any[:x,:y], Any[Any[:fy],Any[Any[:x,:Any,0],Any[:y,:Any,0],Any[:fy,:Any,18]],Any[],0], :(begin # In[3], line 2:\n", | |
" fy = Float64(y)::ANY # line 3:\n", | |
" return (x < fy)::ANY | ((x == fy)::ANY & (fy < Float64(typemax(Int64)::ANY)::ANY)::ANY)::ANY & (unsafe_trunc(Int64,fy)::ANY < y)::ANY::ANY::ANY\n", | |
" end::ANY))))" | |
] | |
}, | |
"execution_count": 38, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"@code_lowered 1.0 < 2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 2. `@code_typed`\n", | |
"\n", | |
"The Julia magic happens\n", | |
"* type inference\n", | |
"* method sorting\n", | |
"* inlining, and other optimisations" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 39, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1-element Array{Any,1}:\n", | |
" :($(Expr(:lambda, Any[:x,:y], Any[Any[:fy],Any[Any[:x,Float64,0],Any[:y,Int64,0],Any[:fy,Float64,18]],Any[],Any[]], :(begin # In[3], line 2:\n", | |
" fy = (top(box))(Float64,(top(sitofp))(Float64,y::Int64)) # line 3:\n", | |
" return (top(box))(Bool,(top(or_int))((top(lt_float))(x::Float64,fy::Float64)::Bool,(top(box))(Bool,(top(and_int))((top(box))(Bool,(top(and_int))((top(eq_float))(x::Float64,fy::Float64)::Bool,(top(lt_float))(fy::Float64,(top(box))(Float64,(top(sitofp))(Float64,9223372036854775807)))::Bool)),(top(slt_int))((top(box))(Int64,(top(fptosi))(Int64,fy::Float64)),y::Int64)::Bool))))\n", | |
" end::Bool))))" | |
] | |
}, | |
"execution_count": 39, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"@code_typed 1.0 < 2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 3. `@code_llvm`\n", | |
"\n", | |
"`@code_llvm` gives the LLVM IR (intermediate representation). Think of it as a cross-platform, assembly-like language.\n", | |
"\n", | |
"The code is shown after various optimsation passes (constant propagation, dead code elimination, etc.).\n", | |
"\n", | |
"If you want to see what each pass does, you can start `julia-debug` with `JULIA_LLVM_ARGS = -print-after-all`. Beware, this will print a *LOT* of text." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 40, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"define i1 @\"julia_<_43379\"(double, i64) {\n", | |
"top:\n", | |
" %2 = sitofp i64 %1 to double, !dbg !3108\n", | |
" %3 = fcmp ogt double %2, %0, !dbg !3110\n", | |
" %4 = fcmp oeq double %2, %0, !dbg !3110\n", | |
" %5 = fcmp olt double %2, 0x43E0000000000000, !dbg !3110\n", | |
" %6 = and i1 %4, %5, !dbg !3110\n", | |
" %7 = fptosi double %2 to i64, !dbg !3110\n", | |
" %8 = icmp slt i64 %7, %1, !dbg !3110\n", | |
" %9 = and i1 %6, %8, !dbg !3110\n", | |
" %10 = or i1 %3, %9, !dbg !3110\n", | |
" ret i1 %10, !dbg !3110\n", | |
"}\n" | |
] | |
} | |
], | |
"source": [ | |
"@code_llvm 1.0 < 2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 4. `@code_native`\n", | |
"Further platform-dependent optimisations, generate the assembly instructions" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 41, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\t.section\t__TEXT,__text,regular,pure_instructions\n", | |
"Filename: In[3]\n", | |
"Source line: 2\n", | |
"\tpushq\t%rbp\n", | |
"\tmovq\t%rsp, %rbp\n", | |
"Source line: 2\n", | |
"\tvcvtsi2sdq\t%rdi, %xmm0, %xmm1\n", | |
"\tvucomisd\t%xmm0, %xmm1\n", | |
"Source line: 3\n", | |
"\tseta\t%cl\n", | |
"\tsetnp\t%dl\n", | |
"\tsete\t%al\n", | |
"\tandb\t%dl, %al\n", | |
"\tmovabsq\t$13336766176, %rdx ## imm = 0x31AEEE6E0\n", | |
"\tvmovsd\t(%rdx), %xmm0\n", | |
"\tvucomisd\t%xmm1, %xmm0\n", | |
"\tseta\t%dl\n", | |
"\tandb\t%al, %dl\n", | |
"\tvcvttsd2si\t%xmm1, %rax\n", | |
"\tcmpq\t%rdi, %rax\n", | |
"\tsetl\t%al\n", | |
"\tandb\t%dl, %al\n", | |
"\torb\t%cl, %al\n", | |
"\tpopq\t%rbp\n", | |
"\tret\n" | |
] | |
} | |
], | |
"source": [ | |
"@code_native 1.0 < 2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Does this mean that comparing integers and floats is computationally expensive?\n", | |
"\n", | |
"Not always." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 42, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"isneg (generic function with 1 method)" | |
] | |
}, | |
"execution_count": 42, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"isneg(x) = x < 0" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1-element Array{Any,1}:\n", | |
" :($(Expr(:lambda, Any[:x], Any[Any[],Any[Any[:x,:Any,0]],Any[],0], :(begin # In[42], line 1:\n", | |
" return x < 0::ANY\n", | |
" end::ANY))))" | |
] | |
}, | |
"execution_count": 43, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"@code_lowered isneg(1.0)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 46, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"define i1 @julia_isneg_43442(float) {\n", | |
"top:\n", | |
" %1 = fcmp olt float %0, 0.000000e+00, !dbg !3138\n", | |
" ret i1 %1, !dbg !3138\n", | |
"}\n" | |
] | |
} | |
], | |
"source": [ | |
"@code_llvm isneg(1f0)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Float vs MathConst comparisons" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 47, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"π = 3.1415926535897..." | |
] | |
}, | |
"execution_count": 47, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"pi" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 48, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"MathConst{:π}" | |
] | |
}, | |
"execution_count": 48, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"typeof(pi)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 49, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3.141592653589793238462643383279502884197169399375105820974944592307816406286198e+00 with 256 bits of precision" | |
] | |
}, | |
"execution_count": 49, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"big(pi)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 50, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"true" | |
] | |
}, | |
"execution_count": 50, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"x = Float64(pi)\n", | |
"x < pi < nextfloat(x)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"<(x::MathConst, y::Float64) = Float64(x,RoundUp) <= y\n", | |
"<(x::Float64, y::MathConst) = x <= Float64(y,RoundDown)\n", | |
"\n", | |
"# this is called by Float64(c,r)\n", | |
"stagedfunction call{T<:Union(Float32,Float64),s}(t::Type{T},c::MathConst{s},r::RoundingMode)\n", | |
" f = T(big(c()),r())\n", | |
" :($f)\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## So what is a `stagedfunction`?\n", | |
"\n", | |
"> `magicfunction`\n", | |
">\n", | |
"> Defines a magical function that shares similarities with warp drives\n", | |
"and matter replicators. Also, the arguments are the types, not values.\n", | |
">\n", | |
"> No it is not a macro.\n", | |
">\n", | |
"> Also, we don't use words like \"homoiconic\" in this household young man.\n", | |
">\n", | |
"> – <cite>Iain Dunning</cite>\n", | |
"\n", | |
"A `stagedfunction` consists of 2 stages:\n", | |
"* at compile time (i.e. when first called), it is evaluated with its arguments as types, and returning an expression.\n", | |
"* this is expression becomes the function for those types." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 51, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1-element Array{Any,1}:\n", | |
" :($(Expr(:lambda, Any[:(t::Any::ANY),:(c::Any::ANY),:(r::Any::ANY)], Any[Any[],Any[Any[:t,Type{Float64},0],Any[:c,MathConst{:π},0],Any[:r,Base.Rounding.RoundingMode{:Down},0]],Any[],Any[]], :(begin \n", | |
" return 3.141592653589793\n", | |
" end::Float64))))" | |
] | |
}, | |
"execution_count": 51, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"@code_typed Float64(pi,RoundDown)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 52, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1-element Array{Any,1}:\n", | |
" :($(Expr(:lambda, Any[:(t::Any::ANY),:(c::Any::ANY),:(r::Any::ANY)], Any[Any[],Any[Any[:t,Type{Float64},0],Any[:c,MathConst{:π},0],Any[:r,Base.Rounding.RoundingMode{:Up},0]],Any[],Any[]], :(begin \n", | |
" return 3.1415926535897936\n", | |
" end::Float64))))" | |
] | |
}, | |
"execution_count": 52, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"@code_typed Float64(pi,RoundUp)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Particularly useful for:\n", | |
"* When functions behave slighlty differently for different parametric types: can avoid lots of \n", | |
"```\n", | |
"for T in (:Type1,:Type2)\n", | |
" @eval foo(x::$T) = ...\n", | |
"end\n", | |
"```\n", | |
"* Variable agurments, e.g. multidimensional indexing:\n", | |
"```\n", | |
"getindex(A::MyArray, inds...)\n", | |
"```\n", | |
"* Used in Keno's [Cxx.jl](https://github.com/Keno/Cxx.jl): allows calling C++ straight from Julia (no wrappers, no precompilation)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Special functions\n", | |
"\n", | |
"Do you know what your `sin` function does?\n", | |
"\n", | |
"* Recent libraries on modern CPUs (support SSE instructions, i.e. the last 5 years or so) don't use the x87 assembly routines (e.g. FSIN)\n", | |
"\n", | |
"In [openlibm](https://github.com/JuliaLang/openlibm) (the default math library used by Julia), there are two steps:\n", | |
"\n", | |
"1. Range reduction: find an integer $n$ and $y \\in [-\\frac{\\pi}{4},\\frac{\\pi}{4}]$ such that\n", | |
" $$ x = n \\frac{\\pi}{2} + y $$\n", | |
" * this is tricky (as $\\pi$ does not correspond to a floating point number):\n", | |
" - Cody–Waite for small values\n", | |
" - Payne–Hanek for large values\n", | |
" \n", | |
" * Calculate $y$ with extra precision: typically stored as a pair of floating point numbers `y_hi + y_lo`\n", | |
" \n", | |
"2. Based on $n$, compute the appropriately signed `sin`/`cos` of $y$.\n", | |
" * a minimax polynomial approximation (similar, but distinct from, a Taylor's series truncation).\n", | |
" * some other libraries incorporate a further range reduction (e.g. $[-\\pi/128,\\pi/128]$), combined with a table-lookup and a shorter polynomial." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"However [a](http://stackoverflow.com/questions/14429902/abssinpi-for-mitigating-systematic-rounding-error) [LOT](http://stackoverflow.com/questions/2670999/how-can-i-work-around-the-fact-that-in-c-sinm-pi-is-not-0) [of](http://stackoverflow.com/questions/22439827/why-is-sinpi-4-different-from-cospi-4-in-python) [people](http://stackoverflow.com/questions/7561698/javascripts-geometrical-methods-zero-isnt-exactly-zero-is-it) [want](http://stackoverflow.com/questions/6566512/value-of-sine-180-is-coming-out-as-1-22465e-16) [to](http://stackoverflow.com/questions/27643267/storing-cos-and-sin-into-a-vector-and-getting-weird-values) [calculate](http://stackoverflow.com/questions/9780057/sin-function-not-giving-correct-answer)\n", | |
"\n", | |
"$$ \\sin \\frac{n \\pi}{N}$$\n", | |
"\n", | |
"for $n = 0,1,\\ldots,N$, e.g.\n", | |
"* Simple discretisations of the circle,\n", | |
"* Fast Fourier transforms (FFTs)\n", | |
"* Bessel functions, complex numbers, etc.\n", | |
"\n", | |
"Although technically correct (according to the IEEE 754 floating point standard), it does seem silly to give answers like:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 53, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1.2246467991473532e-16" | |
] | |
}, | |
"execution_count": 53, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sin(10*pi/10)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 54, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.49999999999999994" | |
] | |
}, | |
"execution_count": 54, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sin(pi/6)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"But we can do better…" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 55, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.0" | |
] | |
}, | |
"execution_count": 55, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sinpi(10/10)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 56, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.5" | |
] | |
}, | |
"execution_count": 56, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sinpi(1/6)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This function is basically\n", | |
"1. A range reduction modulo 1/2\n", | |
" * *can* be done exactly using a standard `rem` operation (or even faster using some other tricks).\n", | |
"2. Multiply by $\\pi$\n", | |
" * Do this using \"double-double\" arithmetic to get `y_hi + y_lo`\n", | |
"3. Call the \"kernel\" `sin`/`cos`\n", | |
"\n", | |
"\n", | |
"* More accurate: to within 1 unit in the last place,\n", | |
"* Is actually *faster* then `sin(pi*x)` (we avoid the expensive range-reduction),\n", | |
"* Is implemented *all* in native Julia." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 57, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Base.Math.DoubleFloat64(0.5235987755982988,2.835264368075643e-17)" | |
] | |
}, | |
"execution_count": 57, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"a = Base.Math.mulpi_ext(1/6)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 58, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.5" | |
] | |
}, | |
"execution_count": 58, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"Base.Math.sin_kernel(a)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Mathematical functions\n", | |
"\n", | |
"Slowly, a lot of the mathematical functions are being implented in Julia.\n", | |
"\n", | |
"* `reinterpret` makes it easy to implement \"bit-twiddling\" functions\n", | |
" * e.g. `significand`, `exponent`, `frexp`.\n", | |
" * much clearer than type-punning in C" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 59, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1.2" | |
] | |
}, | |
"execution_count": 59, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# find y = x * 2^k ∈ [1,2)\n", | |
"reinterpret(Float64,\n", | |
" (reinterpret(UInt64, 0.3) & 0x800f_ffff_ffff_ffff)| 0x3ff0000000000000\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"* `@label`/`@goto` makes it very easy to translate Fortran code\n", | |
" * See Mike Nolta's [SpecialFunctions.jl](https://github.com/nolta/SpecialFunctions.jl): a translation of openspecfun (AMOS and Fadeeva)\n", | |
"\n", | |
"Will hopefully allow users to easily understand what their code is doing (and how to improve it)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Lower-level functions\n", | |
"\n", | |
"`llvmcall` allows you to call LLVM IR directly. This allows you to directly call LLVM intrinsics without C code." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 60, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"getmxcsr (generic function with 1 method)" | |
] | |
}, | |
"execution_count": 60, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# accesses the SSE floating point control status register (mxcsr)\n", | |
"\n", | |
"function getmxcsr()\n", | |
" Base.llvmcall(\"\"\"\n", | |
" %ptr = alloca i32\n", | |
" call void @llvm.x86.sse.stmxcsr(i32 * %ptr)\n", | |
" %curval = load i32 * %ptr\n", | |
" ret i32 %curval\"\"\", UInt32, ())\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 62, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0x00001fa1" | |
] | |
}, | |
"execution_count": 62, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"getmxcsr()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 63, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Flags: 100001\n", | |
"Den = 0: 0\n", | |
"Masks: 111111\n", | |
"Rounding: 00\n", | |
"Flush den: 0\n" | |
] | |
} | |
], | |
"source": [ | |
"function decmx(u::UInt32)\n", | |
" b = bits(u)\n", | |
" println(\"Flags: \",b[27:32])\n", | |
" println(\"Den = 0: \",b[26])\n", | |
" println(\"Masks: \",b[20:25])\n", | |
" println(\"Rounding: \",b[18:19])\n", | |
" println(\"Flush den: \",b[17])\n", | |
"end\n", | |
"decmx(getmxcsr())" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 64, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Inf" | |
] | |
}, | |
"execution_count": 64, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"x = 1.0 / 0.0" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 65, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Flags: 100101\n", | |
"Den = 0: 0\n", | |
"Masks: 111111\n", | |
"Rounding: 00\n", | |
"Flush den: 0\n" | |
] | |
} | |
], | |
"source": [ | |
"decmx(getmxcsr())" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"function setmxcsr(u::UInt32)\n", | |
" Base.llvmcall(\"\"\"%ptr = alloca i32\n", | |
" store i32 %0, i32 * %ptr\n", | |
" call void @llvm.x86.sse.ldmxcsr(i32 * %ptr)\n", | |
" ret void\"\"\", Void, (UInt32,),u)\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"FE_MASK_DIVBYZERO = 0b0_0_00_000100_0_000000\n", | |
"u = getmxcsr()\n", | |
"setmxcsr(u & ~FE_MASK_DIVBYZERO)\n", | |
"decmx(getmxcsr())" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"1.0/0.0" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"setmxcsr(u)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In future, I'm hoping to be able to use this to allow for a sane approach to floating point exceptions.\n", | |
" * Ideally, integrate this into the future debugger." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"## Even lower\n", | |
"\n", | |
"We can also use this for inline assembly (requires LLVM 3.4+)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"asm_pi (generic function with 1 method)" | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"function asm_pi()\n", | |
" Base.llvmcall(\"\"\"\n", | |
" %pi = call double asm \"fldpi\", \"={st},~{dirflag},~{fpsr},~{flags}\"() #0\n", | |
" ret double %pi\"\"\",\n", | |
" Float64, ())\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3.141592653589793" | |
] | |
}, | |
"execution_count": 22, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"asm_pi()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n", | |
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n", | |
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n", | |
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n", | |
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\t.section\t__TEXT,__text,regular,pure_instructions\n", | |
"Filename: In[21]\n", | |
"Source line: 0\n", | |
"\tpushq\t%rbp\n", | |
"\tmovq\t%rsp, %rbp\n", | |
"Source line: 2\n", | |
"\tfldpi\n", | |
"\tfstpl\t-8(%rbp)\n", | |
"\tvmovsd\t-8(%rbp), %xmm0\n", | |
"\tpopq\t%rbp\n", | |
"\tretq\n" | |
] | |
} | |
], | |
"source": [ | |
"@code_native asm_pi()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Other cool stuff in 0.4: meta tags\n", | |
"\n", | |
"These are \"extra\" instructions to control various optimisations.\n", | |
"\n", | |
"## `@inbounds`\n", | |
"\n", | |
"This disables boundary checking of array access and assignment (i.e. standard C behaviour)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"second_element (generic function with 1 method)" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"second_element(x) = x[2]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"define i64 @julia_second_element_43258(%jl_value_t*) {\n", | |
"top:\n", | |
" %1 = getelementptr inbounds %jl_value_t* %0, i64 2, !dbg !2566\n", | |
" %2 = bitcast %jl_value_t* %1 to i64*, !dbg !2566\n", | |
" %3 = load i64* %2, align 8, !dbg !2566, !tbaa %jtbaa_arraylen\n", | |
" %4 = icmp ugt i64 %3, 1, !dbg !2566\n", | |
" br i1 %4, label %idxend, label %oob, !dbg !2566\n", | |
"\n", | |
"oob: ; preds = %top\n", | |
" %5 = alloca i64, align 8, !dbg !2566\n", | |
" store i64 2, i64* %5, align 8, !dbg !2566\n", | |
" call void @jl_bounds_error_ints(%jl_value_t* %0, i64* %5, i64 1), !dbg !2566\n", | |
" unreachable, !dbg !2566\n", | |
"\n", | |
"idxend: ; preds = %top\n", | |
" %6 = getelementptr inbounds %jl_value_t* %0, i64 1, !dbg !2566\n", | |
" %7 = bitcast %jl_value_t* %6 to i8**, !dbg !2566\n", | |
" %8 = load i8** %7, align 8, !dbg !2566, !tbaa %jtbaa_arrayptr\n", | |
" %9 = getelementptr i8* %8, i64 8, !dbg !2566\n", | |
" %10 = bitcast i8* %9 to i64*, !dbg !2566\n", | |
" %11 = load i64* %10, align 8, !dbg !2566, !tbaa %jtbaa_user\n", | |
" ret i64 %11, !dbg !2566\n", | |
"}\n" | |
] | |
} | |
], | |
"source": [ | |
"@code_llvm second_element([1,2,3])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"second_element (generic function with 1 method)" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"second_element(x) = @inbounds x[2]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"define %jl_value_t* @julia_second_element_43259(%jl_value_t*, %jl_value_t**, i32) {\n", | |
"top:\n", | |
" ret %jl_value_t* inttoptr (i64 4371070976 to %jl_value_t*), !dbg !2570\n", | |
"}\n" | |
] | |
} | |
], | |
"source": [ | |
"@code_llvm second_element([1,2,3])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"\n", | |
"## `@inline`\n", | |
"\n", | |
"Function inlining replaces a function call with the actual code from the function. Julia will automatically inline certain functions based on certain rules. \n", | |
"\n", | |
"Some additional control is provided by:\n", | |
"\n", | |
"* `@inline` macro marks functions at declaration which should be inlined (similar to C `inline` statement).\n", | |
"\n", | |
"* `@noinline` marks functions which should NOT be inlined.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## `@fastmath`\n", | |
"\n", | |
"This enables certain floating point optimisations that aren't strictly valid according to the IEEE-754 floating point standard:\n", | |
"* allows reordering of instructions\n", | |
"* allows substitution of fused multiply-add (FMA) instructions\n", | |
"* assumes all values are finite (no Inf or NaNs)\n", | |
"* treats signed zeros are the the same.\n", | |
"* allows reciprocal approximations to division. \n", | |
"\n", | |
"Similar to the C `-ffast-math` compiler flag, but applied only locally." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 66, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"define double @julia_third_43558(double) {\n", | |
"top:\n", | |
" %1 = fdiv double %0, 3.000000e+00, !dbg !3506\n", | |
" ret double %1, !dbg !3506\n", | |
"}\n" | |
] | |
} | |
], | |
"source": [ | |
"third(x) = x/3\n", | |
"@code_llvm third(1.0)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 67, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"define double @julia_third_43559(double) {\n", | |
"top:\n", | |
" %1 = fmul double %0, 0x3FD5555555555555, !dbg !3510\n", | |
" ret double %1, !dbg !3510\n", | |
"}\n" | |
] | |
} | |
], | |
"source": [ | |
"third(x) = @fastmath x/3\n", | |
"@code_llvm third(1.0)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": false | |
}, | |
"source": [ | |
"In future could be used for faster, but less accurate, special functions (`sin`, `log`, etc.)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"## `@simd`\n", | |
"\n", | |
"Allows for limited reordering of loops so as to be able to take advantage of SIMD (single instruction, multiple data) operations on modern processors. \n", | |
"\n", | |
"* Particularly useful for `Int32`/`Float32` operands, though there is typically still some speedup on `Float64` as well." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"# Conclusions\n", | |
"\n", | |
"* `@less` is your friend.\n", | |
"* `@code_*` can tell you lots of useful stuff.\n", | |
"* Meta tags are going to allow more power, and avoid the need for compiler flags" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Julia 0.4.0-dev", | |
"language": "julia", | |
"name": "julia 0.4" | |
}, | |
"language_info": { | |
"name": "julia", | |
"version": "0.4.0" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment