Skip to content

Instantly share code, notes, and snippets.

@justdoit0823
Last active December 30, 2018 01:43
Show Gist options
  • Save justdoit0823/c3db80dec2c8c37c817c338887900558 to your computer and use it in GitHub Desktop.
Save justdoit0823/c3db80dec2c8c37c817c338887900558 to your computer and use it in GitHub Desktop.
Hadoop utilities with python.
# 汇总hdfs web上文件大小
def hdfs_file_size(path):
s_data = open(path).read()
data = s_data.split('\n')
s_size = tuple(r.split('\t')[3].split(' ')[0] for r in data if r)
return sum(map(float, s_size))
# 打开yarn任务页面
def open_yarn_application_page(app_name):
import requests
import webbrowser
host = '127.0.0.1:8088'
apps = requests.get('http://{host}/ws/v1/cluster/apps'.format(host=host)).json()
app = tuple(filter(lambda app: app['state'] == 'RUNNING' and app_name in app['name'], apps['apps']['app']))[0]
app_id = app['id']
app_url = 'http://{host}/proxy/{app_id}/'.format(host=host, app_id=app_id)
webbrowser.open_new_tab(app_url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment