Jupyter on EMR allows users to save their work on Amazon S3 rather than on local storage on the EMR cluster (master node).
To store notebooks on S3, use:
--notebook-dir <s3://your-bucket/folder/>
To store notebooks in a directory different from the user’s home directory, use:
--notebook-dir <local directory>
The following example CLI command is used to launch a five-node (c3.4xlarge) EMR 5.2.0 cluster with the bootstrap action. The BA will install all the available kernels. It will also install the ggplot and nilearn Python packages and set:
the Jupyter port to 8880
the password to jupyter
the JupyterHub port to 8001
aws emr create-cluster --auto-scaling-role EMR_AutoScaling_DefaultRole --applications Name=Hadoop Name=Hive Name=Pig Name=Hue Name=Spark Name=Ganglia Name=Presto Name=Tez --bootstrap-actions '[{"Path":"s3://aws-bigdata-blog/artifacts/aws-blog-emr-jupyter/install-jupyter-emr5-latest.sh","Args":["--toree","--ds-packages","--ml-packages","--python-packages","pandas ggplot","--port","8880","--jupyterhub","--jupyterhub-port","8001","--spark-opts","--packages=com.typesafe:config:1.3.1,org.datasyslab:geospark:0.8.0,com.vividsolutions:jts:1.13,com.databricks:spark-avro_2.11:3.0.0,org.elasticsearch:elasticsearch-spark_2.11:2.4.0","--notebook-dir","s3://yuan.mobiquitynetworks.com/workspace/","--cached-install","--s3fs","--python3"],"Name":"Install Jupyter notebook"}]' --ec2-attributes '{"KeyName":"<your-ec2-key>","InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"subnet-1b58686f","EmrManagedSlaveSecurityGroup":"sg-2418c05e","EmrManagedMasterSecurityGroup":"sg-79e63e03"}' --service-role EMR_DefaultRole --enable-debugging --release-label emr-5.6.0 --log-uri 's3n://aws-logs-452442550777-us-west-2/elasticmapreduce/' --name 'Jupyter Notebook' --instance-groups '[{"InstanceCount":2,"InstanceGroupType":"CORE","InstanceType":"m3.xlarge","Name":"Core - 2"},{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"m3.xlarge","Name":"Master - 1"}]' --scale-down-behavior TERMINATE_AT_INSTANCE_HOUR --region us-west-2
Replace with your AWS access key and with the S3 bucket where you store notebooks. You can also change the instance types to suit your needs and budget.
Reference :
hi yuanzhaoYZ, is there a way to connect it to apache livy to manage emr spark cluster?