It can be hard or impossible to know before compiling when context switches will happen in Go. This means that if you have many goroutines running CPU intensive work (such as the blowfish key expansion step of bcrypt) concurrently (generally we want to run CPU intensive work in parallel and not yield the event loop) it can be possible to enter a situation where work almost never finishes because we continuously context switch in new work as we add it (and if all the operations are roughly the same length Go will tend to complete them all at roughly the same time).
Inlining helps this problem, but complicated crypto functions like
blowfish.ExpandKey
can be more complicated than is allowed by the inliners
"harriness budget". Turning this budget up helps a lot, but results in bigger
binary sizes.
The benchmark in bcrypt_benchmark_test.go was run against Go 1.6, Go 1.7 HEAD (as of 2016-04-25), and the same dev version of Go 1.7 with the max inlining budget turned up to 1000 using the enclosed patch.