Memory LIMIT and REQUEST in Containers and JVM

Do you run a JVM inside a container on Kubernetes (or maybe OpenShift)?
Do you struggle with REQUEST and LIMIT parameters?
Do you know the impact of those parameters on your JVM?
Have you met OOM Killer?

Hope you will find answers to these questions in this example-based article.

How to set up JVM Heap size in a Container

There are actually 3 common ways how get configured your Heap size.

JVM Ergonomics - in general 1/4 of identified memory by JVM (if you don't have a very limited device, I think it is 256MB and less, where the ratio between the heap size and provided memory by OS becomes bigger .. 1/2)
MaxRAMPercentage specifies the Heap size by a percentage of memory identified by JVM
Xmx is directly provided size of the heap without any ergonomics

However, what does memory identified by JVM mean in a World of Containers?

Some time ago, JVM became Container-aware. That means that the JVM is able to recognize that it runs inside the Container and its memory and CPU is somehow limited. Based on these limitations, JVM adjusts maximum heap size to reflect the constraints.

How can we configure the memory limit?

Let's create a simple Java program and place it into a TEMP folder

public class Blocker {
    public static void main(String[] args) throws InterruptedException {
        System.out.println("Running...");
        Thread.currentThread().join();
    }
}

You can notice below -XX:MaxHeapSize=5188354048 and it corresponds to total memory of my laptop (1/4)

$ docker run -it --rm --name test -v /tmp:/tmp adoptopenjdk java /tmp/Blocker.java
Running...

$ docker exec test jcmd 1 VM.flags
-XX:CICompilerCount=4 -XX:ConcGCThreads=2 -XX:G1ConcRefinementThreads=8 -XX:G1HeapRegionSize=4194304 -XX:GCDrainStackTargetSize=64 -XX:InitialHeapSize=327155712 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=5188354048 -XX:MaxNewSize=3112173568 -XX:MinHeapDeltaBytes=4194304 -XX:MinHeapSize=8388608 -XX:NonNMethodCodeHeapSize=5839372 -XX:NonProfiledCodeHeapSize=122909434 -XX:ProfiledCodeHeapSize=122909434 -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:SoftMaxHeapSize=5188354048 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC

$ grep MemTotal /proc/meminfo
MemTotal:       20258148 kB

Let's limit the memory to 1GB -> -XX:MaxHeapSize=268435456

we can even see a different GC !!! (Is G1GC a default Garbage Collector? :))

$ docker run -it --rm -m 1g --name test -v /tmp:/tmp adoptopenjdk java /tmp/Blocker.java
Running...

$ docker exec test jcmd 1 VM.flags
1:
-XX:CICompilerCount=4 -XX:InitialHeapSize=16777216 -XX:MaxHeapSize=268435456 -XX:MaxNewSize=89456640 -XX:MinHeapDeltaBytes=196608 -XX:MinHeapSize=8388608 -XX:NewSize=5570560 -XX:NonNMethodCodeHeapSize=5839372 -XX:NonProfiledCodeHeapSize=122909434 -XX:OldSize=11206656 -XX:ProfiledCodeHeapSize=122909434 -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:SoftMaxHeapSize=268435456 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseSerialGC

How can I pass the memory using Openshift/Kubernetes?

I'm using Openshift, however, you can do the same thing (maybe a bit different syntax) on Kubernetes or different schedulers.

resources:
  limits:
    cpu: '1'
    memory: 2Gi
  requests:
    cpu: '1'
    memory: 2Gi

How can JVM find the memory limit?

If we don't pass any memory limit:

docker run -it adoptopenjdk cat /sys/fs/cgroup/memory/memory.limit_in_bytes                                                   
9223372036854771712

With 1GB memory limit:

docker run -it -m 1g adoptopenjdk cat /sys/fs/cgroup/memory/memory.limit_in_bytes
1073741824

Try the same with you favourite scheduler! Spin up the POD in Kubernetes and execute the command above to figure out the memory limit!

How does the memory is configured in a case of different REQUEST and LIMIT?

What if I configure different REQUEST and LIMIT sizes in Kubernetes?
We need to somehow calculate Heap Size of our JVM process if it based on ergonomics or percentage
What is the "identified memory" by JVM?

Try to set different resources for your POD:

resources:
  limits:
    cpu: '1'
    memory: 2Gi
  requests:
    cpu: '1'
    memory: 1Gi

get a container limit which is passed using the configuration above inside the container/POD.
execute this command inside your POD:

cat /sys/fs/cgroup/memory/memory.limit_in_bytes
2147483648

you can notice that the memory limit is configured according to the LIMIT value!
let's have a look what JVM says in its logs, spin up a new java process inside the existing POD:

java -Xlog:os+container=trace -version
Picked up JAVA_TOOL_OPTIONS: -XX:InitialRAMPercentage=80 -XX:MaxRAMPercentage=80 -XX:+UseSerialGC
[0.001s][trace][os,container] OSContainer::init: Initializing Container Support
[0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.001s][trace][os,container] Path to /memory.use_hierarchy is /sys/fs/cgroup/memory/memory.use_hierarchy
[0.001s][trace][os,container] Use Hierarchy is: 1
[0.001s][trace][os,container] Path to /memory.limit_in_bytes is /sys/fs/cgroup/memory/memory.limit_in_bytes
[0.001s][trace][os,container] Memory Limit is: 2147483648
[0.001s][info ][os,container] Memory Limit is: 2147483648
[0.001s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpuacct,cpu/cpu.cfs_quota_us
[0.001s][trace][os,container] CPU Quota is: 100000
[0.001s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpuacct,cpu/cpu.cfs_period_us
[0.001s][trace][os,container] CPU Period is: 100000
[0.001s][trace][os,container] Path to /cpu.shares is /sys/fs/cgroup/cpuacct,cpu/cpu.shares
[0.001s][trace][os,container] CPU Shares is: 1024
[0.001s][trace][os,container] CPU Quota count based on quota/period: 1
[0.001s][trace][os,container] OSContainer::active_processor_count: 1
[0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 1
[0.017s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 1
[0.028s][trace][os,container] Path to /memory.limit_in_bytes is /sys/fs/cgroup/memory/memory.limit_in_bytes
[0.028s][trace][os,container] Memory Limit is: 2147483648
[0.028s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes
[0.028s][trace][os,container] Memory Usage is: 1194393600

We can see a plenty of interesting information!

This line is interesting for us right now: [0.001s][info ][os,container] Memory Limit is: 2147483648
That means that JVM sees the value from memory.limit_in_bytes and this value is corresponds to the LIMIT value

How about JVM heap size?

You can notice that I started the JVM with these flags Picked up JAVA_TOOL_OPTIONS: -XX:InitialRAMPercentage=80 -XX:MaxRAMPercentage=80 -XX:+UseSerialGC
Let's get the information about the VM flags from the currently runnings JVM process

jcmd 1 VM.flags
Picked up JAVA_TOOL_OPTIONS: -XX:InitialRAMPercentage=80 -XX:MaxRAMPercentage=80 -XX:+UseSerialGC
1:
-XX:CICompilerCount=2 -XX:InitialHeapSize=1719664640 -XX:InitialRAMPercentage=80.000000 -XX:MaxHeapSize=1719664640 -XX:MaxNewSize=573177856 -XX:MaxRAM=2147483648 -XX:MaxRAMPercentage=80.000000 -XX:MinHeapDeltaBytes=196608 -XX:MinHeapSize=8388608 -XX:NewSize=573177856 -XX:NonNMethodCodeHeapSize=5826188 -XX:NonProfiledCodeHeapSize=122916026 -XX:OldSize=1146486784 -XX:ProfiledCodeHeapSize=122916026 -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:SoftMaxHeapSize=1719664640 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseSerialGC

Calculated Heap size is -XX:MaxHeapSize=1719664640 and that could correspond to 80% of provided total memory to the container!!

We are DONE! ... what?! noot at all!!

Isn't a bit weird?!
My Heap size is ~1.7G and the REQUEST 1GB !?!?
REQUEST is the only the guaranteed memory which is "reserved" to be given to our process.

What happens when we go beyond the REQUEST size?

You are standing on very thin ice. You can encounter 2 situations:

You are lucky! Your environment is under provisioned, your scheduler have a plenty resources and gives you more than you requested and if you are really lucky you get the whole LIMIT size.

| ------------ (REQUEST) ------ (MAX HEAP) ------- (LIMIT - OS provided an entire LIMIT) |

You are not so lucky! Your environment is a bit busy and your scheduler is not able to give you the entire LIMIT and gives you only what you requested. That means that you started your JVM process and your heap is filling up, you allocate new objects. The RSS metric constantly grows because of minor faults happen that means that JVM uses untouched pages and OS needs to map pages of process's virtual memory to a physical memory. JVM keeps allocating new objects without doing GC and when your RSS goes over the requested size (1GB in our case) you get OOM Kill and your container/processes is shut down.

| ------------ (REQUEST - OS provided only REQUEST) ----- (MAX HEAP) -------- (LIMIT) |

Could we reproduce the behavior mentioned above?

Yes. There are two options:

You will somehow achieve that your environment is busy, then start your application and keep you eyes on RSS metric and growing heap
or you can cheat a bit and use -XX:+AlwaysPreTouch when you start your container. This option ensures that your RSS will not grow with your heap but the all pages dedicated to JVM heap are touched at the very beginning, that means that your RSS belonging to your Heap is immediatelly fully mapped. AlwaysPreTouch is often used in very large heaps (databases, ..) to sacrifice a startup (dependening on the heap size it can go to minutes - therefore ZGC implemented a parallel pretouch https://malloc.se/blog/zgc-jdk14)
setup your POD with this resource configuration:

resources:
  limits:
    cpu: '1'
    memory: 1Gi
  requests:
    cpu: '1'
    memory: 1Gi

run your JVM application with this options -Xmx1g -Xms1g -XX:+AlwaysPreTouch
you shouldn't be able to start the process and get OOM killed because JVM does not contain only heap memory but native memory as well and both really don't into the container :)

gaeljw/memory-limit-request-jvm.md