Thread pools on the JVM should usually be divided into the following three categories:
- CPU-bound
- Blocking IO
- Non-blocking IO polling
Each of these categories has a different optimal configuration and usage pattern.
// benchmark Java/Scala BigInt/BitSet implementations | |
import scala.util.Random | |
import scala.collection.mutable | |
import scala.collection.immutable | |
import java.util.BitSet | |
import java.math.BigInteger | |
// derived from https://github.com/alexmasselot/benchmark-bitarray/blob/master/src/benchmark/bitarray/TimeIt.scala | |
def timeInMilli(n: Int, f: () => Unit) = { |
# Description: Boxstarter Script | |
# Author: Jess Frazelle <[email protected]> | |
# Last Updated: 2017-09-11 | |
# | |
# Install boxstarter: | |
# . { iwr -useb http://boxstarter.org/bootstrapper.ps1 } | iex; get-boxstarter -Force | |
# | |
# You might need to set: Set-ExecutionPolicy RemoteSigned | |
# | |
# Run this boxstarter by calling the following from an **elevated** command-prompt: |
I was talking to a coworker recently about general techniques that almost always form the core of any effort to write very fast, down-to-the-metal hot path code on the JVM, and they pointed out that there really isn't a particularly good place to go for this information. It occurred to me that, really, I had more or less picked up all of it by word of mouth and experience, and there just aren't any good reference sources on the topic. So… here's my word of mouth.
This is by no means a comprehensive gist. It's also important to understand that the techniques that I outline in here are not 100% absolute either. Performance on the JVM is an incredibly complicated subject, and while there are rules that almost always hold true, the "almost" remains very salient. Also, for many or even most applications, there will be other techniques that I'm not mentioning which will have a greater impact. JMH, Java Flight Recorder, and a good profiler are your very best friend! Mea
I bundled these up into groups and wrote some thoughts about why I ask them!
If these helped you, I'd love to hear about it!! I'm on twitter @vcarl_ or send me an email [email protected]
https://blog.vcarl.com/interview-questions-onboarding-workplace/
There exist several DI frameworks / libraries
in the Scala
ecosystem. But the more functional code you write the more you'll realize there's no need to use any of them.
A few of the most claimed benefits are the following:
import java.util.concurrent.atomic.*; | |
import java.util.concurrent.*; | |
public class Main { | |
private static ExecutorService executor = Executors.newFixedThreadPool(2); | |
private static int iterations = 10000000; | |
public static class Runner { | |
// writes to canceled happen before a CAS on suspended | |
// reads on canceled happen after a CAS on suspended |
So the hacker news post said my comment was too long. Warning, long, opinionated post:
Scala dev for 10+ years here. Spark is weird. Databricks has a style guide that deliberately chooses not to use scala features that the rest of the community uses, and doesn't follow the same best practices around library and scala major version usage that the rest of the community uses [1]. It's no surprise that the project has trouble interoperating with libraries outside of the spark ecosystem, and is therefore a maintenance problem.
Spark's style and compatibility problems
Scala isn't a maintenance nightmare, but it does attract a lot of newcomers who dive in, don't stick within one of its many ecosystems, get confused, and generally leave a mess, and that is a direct result of the fact that scala is a multi-paradigm, relatively expressive language to the one(s) it is competing with and pulling developers from, and that those developers, for the large part, don't really want to change and think that Scala is just a