This is how I quickly got an Apache Zepplin notebook running against the AWS Glue Dev endpoint. None of the guides out there seemed concise, and I found some custom Docker containers doing what you can do easily. This gives you the power - it sets up port forwarding & runs the official Docker image.
- Create your Glue Dev endpoint (this involves creating a keypair, I just used
ssh-keygen
) - Once READY, select it and copy the "SSH tunnel to remote interpreter"
- eg: ssh -i <private-key.pem> -vnNT -L :9007:169.254.76.1:9007 [email protected]
- Connect to the endpoint in a terminal session, modifying the above to match:
ssh -i ~/.ssh/glue-dev -vnNT -L :9007:*127.0.0.1*:9007 glue@<ec2-endpoint>.<region>.compute.amazonaws.com
- Run the Apache Zepplin Docker container
docker run -p 8080:8080 --rm -v $PWD/logs:/logs -v $PWD/notebook:/notebook -e ZEPPELIN_LOG_DIR='/logs' -e ZEPPELIN_NOTEBOOK_DIR='/notebook' --name zeppelin apache/zeppelin:0.7.3
- Update your interpreters to use the existing process (the AWS Glue endpoint).
- Find the intepreter of choice
- Hit edit top right
- Check "Connect to existing process"
- Set Host to:
host.docker.internal
- Set Port to:
9007
- You should now be able to create a notebook and get started!
It is failing with below error
a.lang.RuntimeException: Fail to callRemoteFunction, because connection is broken
at org.apache.zeppelin.interpreter.remote.PooledRemoteClient.callRemoteFunction(PooledRemoteClient.java:108)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:98)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:159)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:126)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:271)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:444)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:72)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)