Skip to content

Instantly share code, notes, and snippets.

@wronk
Last active July 17, 2017 19:26
Show Gist options
  • Save wronk/50e34f2b70f7147beff73e773c2bb351 to your computer and use it in GitHub Desktop.
Save wronk/50e34f2b70f7147beff73e773c2bb351 to your computer and use it in GitHub Desktop.
A basic tutorial on getting an Amazon EC2 instance running with Python for cloud computing
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cloud computing with AWS EC2 in python\n",
"Mark Wronkiewicz (Github: wronk)\n",
"\n",
"This guide covers the basic steps needed to start an instance (i.e., virtual computer) on Amazon Web Service's (AWS) Amazon Elastic Compute Cloud (EC2). It also discusses how to create a working python environment within an instance.\n",
"\n",
"You will learn about:\n",
"* configuring and starting a free instance on EC2\n",
"* accessing a running instance using secure shell (SSH)\n",
"* setting up a python environment in an instance\n",
"* EC2 vocab and available cloud computing capabilities\n",
"\n",
"Terminology in **bold** is defined in the Glossary at the end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup an account with AWS\n",
"\n",
"1. First, setup a free-tier account with AWS by clicking the \"Create and AWS Account\" button [here](https://aws.amazon.com).\n",
"\n",
" Notes on this process:\n",
" 1. You are able to use an existing Amazon.com account login if you have one\n",
" 1. This will be more annoying than a usual account setup -- there's a robo-call verification step\n",
" 1. You will have to enter a credit card, but *you won't be charged for anything in this tutorial*. Amazon wants payment information on record in hopes that you start using high-powered instances later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Spinning up an instance\n",
"\n",
"Let's start up a real instance! Amazon has a nice tutorial that I'll shamelessly offload you to, but please follow along in the summary below. There is some important terminology here, so please don't treat this request like the flight attendant instructing you to read the safety information card.\n",
"\n",
"In the [tutorial](https://aws.amazon.com/getting-started/tutorials/launch-a-virtual-machine/), you'll need to:\n",
" 1. Open the **EC2 Console** in your browser. This is your dashboard for starting/managing instances.\n",
" 1. Configure an instance. \n",
" 1. Select an **instance image** (i.e., OS and software configuration). The virtual machine can be started with any number of operating systems (e.g., Linux, Windows, etc.). Right now stick with the recommended **Amazon Linux AMI** and check out the glossary if you want more info on this instance image. \n",
" 1. Select an **instance type** (i.e., hardware configuration). Many types are available catering to different needs in terms of CPU power, memory, GPU needs, and networking speed. The names are not terribly intuitive, but that's okay for now -- *choose the free **t2.micro*** which should be selected by default.\n",
" 1. Setup secure shell (SSH) access.\n",
" You need to create an SSH key pair so that you can remotely access your **virtual private cloud (VPC)**. This provides secure access to all instances you create now and in the future, so **save it in a place you won't forget, and treat it as a sensitive file** (i.e., don't email it out willy-nilly).\n",
" 1. Launch the instance\n",
" \n",
"## Connect to the instance\n",
" 1. Get the IP address.\n",
" You need to find your instance's IP address, so that the computer you're physically in front of knows how to find the instance on Amazon's servers. The IP is available under the \"Instances\" tab of the EC2 Console -- just scroll to the right in the instances table. Note that it may take a minute for the instance to boot up and have an IP assigned.\n",
" 1. SSH into instance\n",
" Follow the instructions to SSH into your instance using the SSH key configuration file you just downloaded. If you have trouble, see the abbreviate cheat sheet at the bottom of this guide or use Google to find a more in-depth tutorial.\n",
" \n",
" 1. **Have some fun before terminating the instance!** Once you've SSH'ed in type:\n",
" sudo yum install cowsay fortune-mod -y\n",
" cowsay I'm playing with command line cows on Amazon's dime.\n",
" fortune | cowsay\n",
" fortune | cowsay\n",
" \n",
" 1. Terminate the instance via EC2 console\n",
" \n",
"All done! You just spun up a simple instance on the cloud!\n",
"\n",
"If you missed it above, here's the [tutorial link](https://aws.amazon.com/getting-started/tutorials/launch-a-virtual-machine/)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up Python software on your instance\n",
"The default instance image we used above (**Amazon Linux AMI**) is pretty bare-bones. If you're trying to do scientific computing with Python on EC2, there are 2 main ways to get Python libraries installed on your instance.\n",
"\n",
"1. **Easy way: pick an instance image with desired packages pre-installed**\n",
" \n",
" If you're new to cloud computing, this is the simple option. There are many alternative instance images available on EC2 with different sets of software pre-installed. For example, there are several versions of Ubuntu (incl. 16.04) that come with Anaconda or deep learning packages. This can save time and money because you don't need to pay for a running instance while installing software.\n",
" \n",
" To go this route, go to your EC2 dashboard and click the \"Launch Instance\" button. Instead of selecting Amazon Linux AMI (as we did in the tutorial), click on the \"Community AMIs\" tab on the left side of the screen. Then type in \"Anaconda\" into the search bar and pick the configuration you want -- several of these are officially maintained by Continuum. Here's a [link to the in-depth guide from Continuum](https://docs.continuum.io/anaconda/user-guide/tasks/integration/amazon-aws) covering how to select one of these instance images on EC2. You can also search for \"TensorFlow\" to find pre-configured images suitable for deep learning.\n",
" \n",
" Look towards the end of this guide for a cheat sheet on using SSH to connect to an Amazon Linux AMI or Ubuntu instance.\n",
"\n",
"1. **Hard way: Installing software from scratch**\n",
" \n",
" I don't recommend this route unless want to more fully customize or optimize your instance.\n",
" 1. If you want to start from a clean install of an Ubuntu image, you should just follow your normal installation routines (via pip or from source).\n",
" 1. If, however, you're using Amazon's Linux AMI, things are a little more complicated. You'll need to install some dev-tools before being able to install Python modules with pip. Amazon's Linux AMI is heavily optimized to work on the cloud though, so you may obtain a performance boost by using it. Here are some existing guides:\n",
" 1. Simple guide for getting scikit-learn working on Amazon's Linux AMI [here](http://blog.adeel.io/2016/11/19/installing-pandas-scipy-numpy-and-scikit-learn-on-aws-ec2/)\n",
" 1. In-depth guide for Jupyter, Plotly.js, scikit-learn, and others as well as some bells and whistles (e.g., IAM Role, security group, swap space) [here](http://neuralfoundry.com/jupyter-plotly-pandas-scipy-numpy-and-scikit-learn-on-aws-ec2/)\n",
" 1. For guides to installing deep learning toolkits from scratch on Amazon's Linux AMI, google your desired setup (e.g., \"running tensorflow on aws linux ami gpu\". There are a ton of very specific guides, and new ones are constantly coming out.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Elastic block storage (EBS)\n",
"EBS is the EC2 mechanism to attach virtual drives to an instance. These drives are flexible and can be remounted to any instance. They aren't deleted when an instance is terminated making them useful for processing across the lifetime of 2+ instances (e.g., if you're running spot instances).\n",
"\n",
"EBS volumes are scalable between 1GB to 16TB and come in a [few different flavors](https://aws.amazon.com/ebs/pricing/). The standard \"gp2\" EBS space is $0.10 per GB-month. It is also possible to create \"snapshots\" of EBS volumes and store them on S3 for backup."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instance types\n",
"There are a handful of different instance types, which correspond to different hardware configurations (optimizing some combo of CPU, memory, storage, or GPU needs). Within each type, there are several available sizes. Cost goes up with size.\n",
"\n",
"[Amazon's guide to each instance type](https://aws.amazon.com/ec2/instance-types/)\n",
"\n",
" \n",
"#### Relevant instance types for scientific computing\n",
"* t2\n",
" * Description: Standard CPU with potential to handle bursts of activity \n",
" * Use case: putzing around while learning how to use EC2, web apps, general purpose applications\n",
"* p2/g2\n",
" * Description: High end GPUs\n",
" * Use case: general purpose GPU work (e.g., TensorFlow)\n",
"* c4\n",
" * Description: High end CPUs\n",
" * Use case: good for batch processing, CPU-intensive work (e.g., scikit-learn gridsearch)\n",
" \n",
"#### Instance limits\n",
"Note that there are limits on how many instances you can spin up. These limits depend on your account and the instance type. You can request to have your limit increased as described [here](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instance type pricing\n",
"There are different pricing models available depending on your computing needs. For short-term research jobs, on-demand and spot instances are most relevant, and the latter should be the go-to for most jobs as it's typically much cheaper. That said, a full summary of each instance type is below.\n",
"Overview: https://aws.amazon.com/ec2/pricing/\n",
"Cost calculator: http://calculator.s3.amazonaws.com/index.html\n",
" \n",
"1. [On-demand instances](https://aws.amazon.com/ec2/pricing/on-demand/) are the most straight forward. You spin up whatever instance you want for a set price and only pay for the time the instance is running. Good for short, sparse jobs, but the on-demand convenience can be quite expensive.\n",
" \n",
"1. [Spot instances](https://aws.amazon.com/ec2/spot/). These operate based on an actively fluctuating bid price. If you place a bid for a specific instance type, an instance will be allocated to you whenever the market price is below your bid. **This is often much cheaper than on-demand pricing.** However, your applications must gracefully handle forced terminations -- your instance will be automatically shut off whenever the market price exceeds your bid.\n",
"\n",
"1. [Reserved instances](https://aws.amazon.com/ec2/pricing/reserved-instances/). These are for long-term computing needs (typically 1 year or 3 years) and are often cheaper than on-demand instances because of the time commitment. There is a reserved instance marketplace for buyers who ended up not needing all their compute power.\n",
"\n",
"1. [Dedicated instance](https://aws.amazon.com/ec2/purchasing-options/dedicated-instances/). These instances are on hardware that's physically separate from other AWS users.\n",
"\n",
"1. [Dedicated host](https://aws.amazon.com/ec2/dedicated-hosts/). This allows you to purchase time on an entire physical server. A dedicated host is use useful for complying with strict software licenses that only allow a single machine to use the license. You're also able to control how individual instances are created on this server.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion and more resources\n",
"\n",
"That's it! In this guide, we learned about the different types of EC2 instances, how to spin up an individual EC2 instances, how to issue commands via SSH, and how to get a python environment up a running. To ideas on what to learn next, see the below steps.\n",
"\n",
"#### Cloud formation Cluster (CfnCluster)\n",
"For advanced compute jobs, you may want to spin up several instances that work in concert to handle a more intensive job. Amazon **CfnCluster** is a good way to accomplish this without having to manually initialize every instance. CfnCluster allows you to define a **template** file that is essentially a text file defining the type and number of virtual machines you want to spin up. CfnCluster can the coordinate dozens of instances (collectively referred to as a \"stack\") to carry out **high-performance computing** (HPC).\n",
"\n",
"#### HPC Resources\n",
"* Amazon put out its own guide for HPC with CfnCluster [here](https://www.youtube.com/watch?v=FEIkzg72D5s). The presenter starts with an overview, then goes on to describe configuring a cluster (~12:20), and demos how to actually launch a cluster starting at 21:00.\n",
"* Overview of HPC [here](https://aws.amazon.com/s/dm/optimization/server-side-test/hpc/)\n",
"* Overview of CfnCluster [here](https://aws.amazon.com/s/dm/optimization/server-side-test/hpc/cfncluster/). Click on the \"Getting Started\" link for more instructions.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Glossary\n",
"Full AWS glossary [here](http://docs.aws.amazon.com/general/latest/gr/glos-chap.html)\n",
"\n",
"Some relevant terms:\n",
"\n",
"* **AMI**: “Amazon machine image.” This is Amazon's own distro of linux (though it was originally based on [RHEL 5 and 6](https://forums.aws.amazon.com/thread.jspa?threadID=51647)). It is specifically optimized to work well on EC2 in terms of stability, security, and performance and has many useful EC2 tools already installed on it.\n",
"* **CfnCluster**: Amazon's open source tool for managing a cluster of computing resources. This is often used by researchers that are carrying out HPC (e.g., fluid dynamics, weather modeling, genomics analysis).\n",
"* **EBS**: “elastic block storage” persistent block storage for Amazon instances that can be reattached to new instances on demand. Useful because this storage system is not deleted when the instance it's attached to is terminated.\n",
"* **EC2**: “elastic compute 2”; name of over-arching service for cloud computing\n",
"* **EC2 Console**: This dashboard allows you to launch an instance, monitor your running instances, and manage other AWS details\n",
"* **ECU**: “elastic compute unit” standard metric for CPU and memory capability\n",
"* **Elastic IP**: IP address that is assigned to your account and points to an instance; allows your code to quickly reconnect with an AMI instance w/out manually re-updating the instance IP address. Free if instance is in use, small charge otherwise to prevent people from hoarding them\n",
"* **HPC**: \"High-performance computing\"; Computing that is compute- or data-intensive. CfnCluster is a tool for HPC using Amazon's cloud\n",
"* **Instance**: virtual machine on Amazon’s servers; On EC2, this is what is created/booted up\n",
"* **Instance image**: The OS and software setup of your instance. Examples include Amazon Linux AMI, Ubuntu Server 16.04, Windows Server 2016, etc.\n",
"* **Instance type**: The hardware configuration for the instance. There are many options that trade off between CPU power, memory, network speed, and GPU availability. Examples include t2.micro, c4.8xlarge, p2.16xlarge, etc.\n",
"* **S3**: “simple storage service” ‒ Amazon's cloud data repository that’s easy to use/scalable\n",
"* **VPC**: “virtual private cloud” ‒ cluster of instances that you have access to (via SSH) and are isolated from the rest of Amazon’s mega-network"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SSH Cheat sheet\n",
"For Amazon Linux AMI: \n",
"* ssh -i ~/.ssh/MyKeyPair.pem [email protected]\n",
" \n",
"For Ubuntu: \n",
"* ssh -i ~/.ssh/MyKeyPair.pem [email protected]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@wronk
Copy link
Author

wronk commented Jul 17, 2017

TODO:

  • Typos
  • Stress how much cheaper spot instances can be
  • Add CfnCluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment