Skip to content

Instantly share code, notes, and snippets.

@mmkhitaryan
Created September 1, 2024 20:40
Show Gist options
  • Save mmkhitaryan/99277c3784a37fbf7ec33354d81acc5d to your computer and use it in GitHub Desktop.
Save mmkhitaryan/99277c3784a37fbf7ec33354d81acc5d to your computer and use it in GitHub Desktop.
AWS ECS stuck in PENDING state g4dn

Disk space issue

In my case I used container which had size of 12GB. It contained yolo model for inference, so the container was large. By default the size of template for ECS ec2 is 30GB, so it could not conain entire OS+Image.

The solution was changing the template root storage to 100GB.

Card type issue

Another case of PROVISIONING stuck was with spot instances. I wanted to save on inference costs, so wanted to run spot instead of on-Demand.

I set the filter to GPU accelerator, and selected all the instances which had GPU. The issue was that there are also non-ndidia GPUs in AWS. So Nvidia driver linux got installed on AMD machines, and the tasks got stuck in PROVISIONING state.

I disabled all the non-ndidia spot instance types, and it fixed the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment