Skip to content

Instantly share code, notes, and snippets.

View juliensimon's full-sized avatar

Julien Simon juliensimon

View GitHub Profile
import boto3
trainingJobName = "xgboost-2019-05-09-15-20-51-276"
print(trainingJobName)
sm = boto3.client("sagemaker")
job = sm.describe_training_job(TrainingJobName=trainingJobName)
trainingImage = job['AlgorithmSpecification']['TrainingImage']
AWSTemplateFormatVersion: 2010-09-09
Parameters:
ModelName:
Description: Model name
Type: String
ModelDataUrl:
Description: Location of model artefact
Type: String
TrainingImage:
@juliensimon
juliensimon / smdeploy-2.py
Created July 26, 2018 17:37
smdeploy-2.py
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
endpoint_name = job_name_prefix + '-ep-' + timestamp
print('Endpoint name: {}'.format(endpoint_name))
endpoint_params = {
'EndpointName': endpoint_name,
'EndpointConfigName': endpoint_config_name,
}
endpoint_response = sagemaker.create_endpoint(**endpoint_params)
@juliensimon
juliensimon / smdeploy-1.py
Last active December 9, 2018 14:13
smdeploy-1.py
job_name_prefix = 'DEMO-imageclassification'
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
endpoint_config_name = job_name_prefix + '-epc-' + timestamp
endpoint_config_response = sagemaker.create_endpoint_config(
EndpointConfigName = endpoint_config_name,
ProductionVariants=[
{
'InstanceType':'ml.m4.xlarge',
@juliensimon
juliensimon / pytorch-log
Created June 2, 2018 08:21
pytorch-log
Hyper parameters: {'epochs': '10', 'lr': '0.01', 'batch_size': '128'}
Input parameters: {'training': {'RecordWrapperType': 'None', 'TrainingInputMode': 'File', 'S3DistributionType': 'FullyReplicated'}, 'validation': {'RecordWrapperType': 'None', 'TrainingInputMode': 'File', 'S3DistributionType': 'FullyReplicated'}}
Train Epoch: 1 [0/60000 (0%)] Loss: 2.349514
Train Epoch: 1 [1280/60000 (2%)] Loss: 2.296775
Train Epoch: 1 [2560/60000 (4%)] Loss: 2.258955
Train Epoch: 1 [3840/60000 (6%)] Loss: 2.243712
Train Epoch: 1 [5120/60000 (9%)] Loss: 2.108034
Train Epoch: 1 [6400/60000 (11%)] Loss: 1.979539
...
Train Epoch: 10 [56320/60000 (94%)] Loss: 0.178176
@juliensimon
juliensimon / pytorch-params.py
Last active June 2, 2018 08:11
pytorch-params.py
# SageMaker paths
prefix = '/opt/ml/'
param_path = os.path.join(prefix, 'input/config/hyperparameters.json')
data_path = os.path.join(prefix, 'input/config/inputdataconfig.json')
# Read hyper parameters passed by SageMaker
with open(param_path, 'r') as params:
hyperParams = json.load(params)
lr = float(hyperParams.get('lr', '0.1'))
@juliensimon
juliensimon / pytorch-estimator.py
Created June 2, 2018 08:03
pytorch-estimator.py
output_path = 's3://{}/{}/output'.format(sess.default_bucket(), repo_name)
image_name = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, repo_name)
print(output_path)
print(image_name)
estimator = sagemaker.estimator.Estimator(
image_name=image_name,
base_job_name=base_job_name,
role=role,
@juliensimon
juliensimon / pytorch-load.py
Last active June 2, 2018 07:57
pytorch-load.py
# SageMaker paths
prefix = '/opt/ml/'
input_path = os.path.join(prefix, 'input/data/')
# Adapted from https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py
class MyMNIST(data.Dataset):
def __init__(self, train=True, transform=None, target_transform=None):
self.transform = transform
self.target_transform = target_transform
self.train = train # training set or test set
@juliensimon
juliensimon / pytorch-Dockerfile
Created June 2, 2018 07:45
pytorch-Dockerfile
FROM nvidia/cuda:9.0-runtime
RUN apt-get update && \
apt-get -y install build-essential python-dev python3-dev python3-pip python-imaging wget curl
COPY mnist_cnn.py /opt/program/train
RUN chmod +x /opt/program/train
RUN pip3 install http://download.pytorch.org/whl/cu90/torch-0.4.0-cp35-cp35m-linux_x86_64.whl --upgrade && \
pip3 install torchvision --upgrade
@juliensimon
juliensimon / kerasmxnet-run.log
Created May 30, 2018 17:46
kerasmxnet-run.log
Using MXNet backend
Hyper parameters: {'lr': '0.01', 'batch_size': '256', 'epochs': '10', 'gpus': '2'}
Input parameters: {'validation': {'S3DistributionType': 'FullyReplicated', 'TrainingInputMode': 'File', 'RecordWrapperType': 'None'}, 'training': {'S3DistributionType': 'FullyReplicated', 'TrainingInputMode': 'File', 'RecordWrapperType': 'None'}}
Files loaded
x_train shape: (60000, 1, 28, 28)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
/usr/local/lib/python3.5/dist-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.00390625). Is this intended?