This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
trainingJobName = "xgboost-2019-05-09-15-20-51-276" | |
print(trainingJobName) | |
sm = boto3.client("sagemaker") | |
job = sm.describe_training_job(TrainingJobName=trainingJobName) | |
trainingImage = job['AlgorithmSpecification']['TrainingImage'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AWSTemplateFormatVersion: 2010-09-09 | |
Parameters: | |
ModelName: | |
Description: Model name | |
Type: String | |
ModelDataUrl: | |
Description: Location of model artefact | |
Type: String | |
TrainingImage: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime()) | |
endpoint_name = job_name_prefix + '-ep-' + timestamp | |
print('Endpoint name: {}'.format(endpoint_name)) | |
endpoint_params = { | |
'EndpointName': endpoint_name, | |
'EndpointConfigName': endpoint_config_name, | |
} | |
endpoint_response = sagemaker.create_endpoint(**endpoint_params) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
job_name_prefix = 'DEMO-imageclassification' | |
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime()) | |
endpoint_config_name = job_name_prefix + '-epc-' + timestamp | |
endpoint_config_response = sagemaker.create_endpoint_config( | |
EndpointConfigName = endpoint_config_name, | |
ProductionVariants=[ | |
{ | |
'InstanceType':'ml.m4.xlarge', |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hyper parameters: {'epochs': '10', 'lr': '0.01', 'batch_size': '128'} | |
Input parameters: {'training': {'RecordWrapperType': 'None', 'TrainingInputMode': 'File', 'S3DistributionType': 'FullyReplicated'}, 'validation': {'RecordWrapperType': 'None', 'TrainingInputMode': 'File', 'S3DistributionType': 'FullyReplicated'}} | |
Train Epoch: 1 [0/60000 (0%)] Loss: 2.349514 | |
Train Epoch: 1 [1280/60000 (2%)] Loss: 2.296775 | |
Train Epoch: 1 [2560/60000 (4%)] Loss: 2.258955 | |
Train Epoch: 1 [3840/60000 (6%)] Loss: 2.243712 | |
Train Epoch: 1 [5120/60000 (9%)] Loss: 2.108034 | |
Train Epoch: 1 [6400/60000 (11%)] Loss: 1.979539 | |
... | |
Train Epoch: 10 [56320/60000 (94%)] Loss: 0.178176 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# SageMaker paths | |
prefix = '/opt/ml/' | |
param_path = os.path.join(prefix, 'input/config/hyperparameters.json') | |
data_path = os.path.join(prefix, 'input/config/inputdataconfig.json') | |
# Read hyper parameters passed by SageMaker | |
with open(param_path, 'r') as params: | |
hyperParams = json.load(params) | |
lr = float(hyperParams.get('lr', '0.1')) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
output_path = 's3://{}/{}/output'.format(sess.default_bucket(), repo_name) | |
image_name = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, repo_name) | |
print(output_path) | |
print(image_name) | |
estimator = sagemaker.estimator.Estimator( | |
image_name=image_name, | |
base_job_name=base_job_name, | |
role=role, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# SageMaker paths | |
prefix = '/opt/ml/' | |
input_path = os.path.join(prefix, 'input/data/') | |
# Adapted from https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py | |
class MyMNIST(data.Dataset): | |
def __init__(self, train=True, transform=None, target_transform=None): | |
self.transform = transform | |
self.target_transform = target_transform | |
self.train = train # training set or test set |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM nvidia/cuda:9.0-runtime | |
RUN apt-get update && \ | |
apt-get -y install build-essential python-dev python3-dev python3-pip python-imaging wget curl | |
COPY mnist_cnn.py /opt/program/train | |
RUN chmod +x /opt/program/train | |
RUN pip3 install http://download.pytorch.org/whl/cu90/torch-0.4.0-cp35-cp35m-linux_x86_64.whl --upgrade && \ | |
pip3 install torchvision --upgrade |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Using MXNet backend | |
Hyper parameters: {'lr': '0.01', 'batch_size': '256', 'epochs': '10', 'gpus': '2'} | |
Input parameters: {'validation': {'S3DistributionType': 'FullyReplicated', 'TrainingInputMode': 'File', 'RecordWrapperType': 'None'}, 'training': {'S3DistributionType': 'FullyReplicated', 'TrainingInputMode': 'File', 'RecordWrapperType': 'None'}} | |
Files loaded | |
x_train shape: (60000, 1, 28, 28) | |
60000 train samples | |
10000 test samples | |
Train on 60000 samples, validate on 10000 samples | |
Epoch 1/10 | |
/usr/local/lib/python3.5/dist-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.00390625). Is this intended? |