Note: this guide is designed for AWS ECS services, but starting from Step 4 is functionally equivalent to any Docker container on a Linux host.
- Log into the AWS Console using the appropriate AWS account
- Navigate to AWS ECS service clusters (https://console.aws.amazon.com/ecs/home)
- Make sure you are in the correct region, if not, switch to the correct region (second drop-down menu in top right corner)
- Select the correct cluster (ex: https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters//services)
- In the Services tab, In the 'Filter in this page' text box, type the name of the service
- Click the appropriate service name
- Click the Tasks tab
- Click one of the Task items, such as 95f7f552-5515-4997-81b7-8b6784a504b1. Copy this down as your Task ID.
- Click the Container instance, such as 3040f458-b8e7-45d6-b3bb-af4101fce510
- Copy the Private IP, such as 172.31.45.106
If you use a bastion host to connect to your EC2 Container Instances, you might use a command like the following:
ssh -v \
-o ProxyCommand='ssh ec2-user@<my-bastion-hostname> -i ~/.ssh/<my-bastion-key> -N -W %h:%p' \
-o ForwardAgent=yes \
-i ~/.ssh/<ec2-container-instance-key> \
-l ec2-user \
172.31.45.106
- The last line is the Private IP you copied before.
- List docker processes and grep for the service name.
docker ps | grep -i <My_ECS_Service_Name> 8e1f316737c3 <my-container-repo-name-tag> "/some/long/command…" 2 months ago Up 2 months 8080/tcp, 0.0.0.0:11720->8080/tcp <some-long-service-name-string-d4e99688fca48e89fd01>
- If it looks like the right service, copy the first part of the output (8e1f316737c3). This is the Docker container ID.
- List all containers' IDs and task IDs and grep for the correct one
docker ps -q \ | xargs -n1 docker inspect -f '{{.Id}} {{index .Config.Labels "com.amazonaws.ecs.task-arn"}}' \ | grep 95f7f552-5515-4997-81b7-8b6784a504b1 8e1f316737c34a57243c87444b8ba2ad6b5eea140df8988dc44f0692fc3fc90a arn:aws:ecs:us-east-1:1234567890:task/95f7f552-5515-4997-81b7-8b6784a504b1
- The first part of the output (before a space) is the container ID. You only need the first 12 characters of it.
- Get the host process ID of the container's process.
docker inspect -f '{{.State.Pid}}' 8e1f316737c3 28780
In order to perform a heap dump or core dump, you need some tools to extract the memory of the application. If curl isn't available on your system, try wget.
For Java applications, download the tools jattach, gcore and gdb:
for i in jattach gcore gdb ; do \
curl -L -o $i https://github.com/pwillis-els/docker-build-static/releases/download/v0.1.0/$i ; \
done
chmod 755 jattach gcore gdb
Download just gcore and gdb as shown above.
There are two ways to generate heap dumps for Java processes. The first option will attempt to communicate with the JVM and instruct it to perform the heap dump itself. If the JVM is not responding, this will not work, and you'll have to switch to option 2: a core dump of the process.
The reason jattach is used (and not jmap or jcmd) are as follows:
- jattach is a tiny binary that's easy to download or copy, so no need to provide an entire JDK, as you do with jmap and jcmd.
- jattach works on the host operating system and enters the container namespace before opening a socket to the running java process.
- If jmap or jcmd fails to attach to the JRE, it will fall back to using ptrace() to copy memory one byte at a time. In that case, the gdb method below is much faster, though potentially larger.
- Attempt jattach heap dump using the host process ID from earlier.
This will create a heap dump file inside the container (until we come up with a revised method).
sudo ./jattach 28780 dumpheap /tmp/dumpheap-28780.dump
- Copy the dump file out of the container, onto the local disk. This will use the container ID from before.
docker cp 8e1f316737c3:/tmp/dumpheap-28780.dump host-dumpheap-28780.dump
- Remove the heap dump from the container.
docker exec -it 8e1f316737c3 rm /tmp/dumpheap-28780.dump
Note: at the end of the sudo ./jattach 28780 dumpheap /tmp/dumpheap-28780.dump
, you may want to add the option -live
.
This will only collect live objects in memory, making the heap dump significantly smaller.
It's up to the developers to decide which they'd prefer: -live
or -all
(the default).
If the jattach above either didn't work or doesn't apply, use the following method. There are two ways to go about this, so try the first, then the second.
- Attempt gcore core dump using the host process ID from earlier.
You will probably see a lot of output and a bunch of errors.
sudo ./gcore -a 28780
- Wait until you see a line that says "Saved corefile core.28780" or similar. This file should exist in your current directory; use it instead of the host-dumpheap-28780.dump file.
You have a couple options here:
- Compressing with xz (faster than gzip, and much better compression)
$ xz -v -7 host-dumpheap-28780.dump
- Compressing with zstd. You will have to download this tool, but it is lightning fast (50 seconds rather than 10 minutes) and just as good compression as xz. However, it may take up more memory and/or CPU.
./zstd -v -7 host-dumpheap-28780.dump
- TBD
From the EC2 Container Instance, you can attempt to jump into the container using its Container ID, with the following command:
docker exec -it 8e1f316737c3 sh
/deployments $
- Options for capturing Java heap dumps: https://www.baeldung.com/java-heap-dump-capture
- jattach:
- How to capture java heap dumps 7 ways: https://dzone.com/articles/how-to-capture-java-heap-dumps-7-options
- Finding java memory leaks from a heap dump: https://dzone.com/articles/finding-java-memory-leaks-from-a-heap-dump
- More on Java heap dumps: https://javaeesupportpatterns.blogspot.com/2012/11/java-heap-dump-are-you-up-to-task.html
- StackOverflow links:
- Java Heap Analysis Tool: https://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html
If you are already using SSH, you should be able to use scp over the existing connection instead of dropping on S3