Skip to content

Instantly share code, notes, and snippets.

@ZacBlanco
Last active July 5, 2018 18:51
Show Gist options
  • Save ZacBlanco/77a4c8d526961e757363affa938cf389 to your computer and use it in GitHub Desktop.
Save ZacBlanco/77a4c8d526961e757363affa938cf389 to your computer and use it in GitHub Desktop.
Connect to Spark UI using an SSH Proxy for HPC Clusters

So you want to use the spark UI?

HPC clusters typically don't have great support for long-running clusters, thus information from the cluster and UI are short-lived. It also makes it more difficult to monitor Spark jobs without the UI.

All you have to do is use SSH to configure a proxy and tell firefox (or your preferred browser) to use that tunnel as a proxy.

Steps

1 - Get an SSH tunnel

I chose to use port 8080 for my SOCKS proxy.

ssh -D 8080 -C -N [email protected]

2 - Configure the Browser

I use firefox, so I go to firefox -> preferences -> proxy

click Manual Proxy Configuration -> SOCKS Host ->, host is localhost. Port is 8080

Also I chose to click "Enable DNS via SOCKSv5 Proxy" - this allowed me to use DNS resolution on my machine.

3 - Connecting

Finally we can connect to the UI to monitor the jobs.

In the browser, type: http://comet-node-num:4040 and voila! The spark UI should appear. Don't close out of that terminal with the SSH command though or you'll lose the proxy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment