Skip to content

Instantly share code, notes, and snippets.

@torkale
Last active August 29, 2015 14:07
Show Gist options
  • Save torkale/7bfb473b724155258155 to your computer and use it in GitHub Desktop.
Save torkale/7bfb473b724155258155 to your computer and use it in GitHub Desktop.

Install hue on EMR

Prequisites

  • AMI version 3.2.1
  • hive 0.13.1

Installation

1. Download hue version

$ wget --no-check-certificate https://dl.dropboxusercontent.com/u/730827/hue/releases/3.6.0/hue-3.6.0.tgz

2. Open the tar

$ tar -xvf hue-3.6.0.tgz

3. Install dependencies

$ sudo yum install -y ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel mvn mysql mysql-devel openldap-devel python-devel python-simplejson sqlite-devel 

4. Install hue

$ cd hue-3.6.0 && PREFIX=/home/hadoop make install

5. Modify hue.ini

$ cd ~/hue/desktop/conf/hue.ini
# The port where the ResourceManager IPC listens on
      resourcemanager_port=9022
      
# Enter the filesystem uri
      fs_defaultfs=hdfs://localhost:9000
      
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
      webhdfs_url=http://localhost:9101/webhdfs/v1

      querycache_rows=0

6. Modify hive-site.xml

```
<property><name>hive.server2.authentication</name><value>NONE</value></property>
<property><name>hive.server2.enable.doAs</name><value>false</value></property>
```

7. Run hiveserver2

nohup hiveserver2 > /dev/null &

8. Run supervisor

$ ~/hue/build/env/bin/supervisor

Open issues

  1. Hive 0.13.1 events.hql requires parquet adjustments (STORED AS PARQUET)

  2. Hive query -> Can't get log

    [07/Oct/2014 05:48:56 -0700] thrift_util  ERROR    Thrift saw exception (this may be expected).
    

Traceback (most recent call last): File "/home/hadoop/hue/desktop/core/src/desktop/lib/thrift_util.py", line 371, in wrapper ret = res(*args, **kwargs) File "/home/hadoop/hue/apps/beeswax/src/beeswax/../../gen-py/TCLIService/TCLIService.py", line 745, in GetLog return self.recv_GetLog() File "/home/hadoop/hue/apps/beeswax/src/beeswax/../../gen-py/TCLIService/TCLIService.py", line 761, in recv_GetLog raise x TApplicationException: Invalid method name: 'GetLog' ```

  1. Resource manager is unavailable

    [07/Oct/2014 05:49:47 -0700] base         ERROR    Internal Server Error: /jobbrowser/
    Traceback (most recent call last):
      File "/home/hadoop/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/core/handlers/base.py", line 111, in get_response
        response = callback(request, *callback_args, **callback_kwargs)
      File "/home/hadoop/hue/apps/jobbrowser/src/jobbrowser/views.py", line 96, in jobs
        jobs = get_api(request.user, request.jt).get_jobs(user=request.user, username=user, state=state, text=text, retired=retired)
      File "/home/hadoop/hue/apps/jobbrowser/src/jobbrowser/api.py", line 207, in get_jobs
        json = self.resource_manager_api.apps(**filters)
      File "/home/hadoop/hue/desktop/libs/hadoop/src/hadoop/yarn/resource_manager_api.py", line 72, in apps
        return self._root.get('cluster/apps', params=kwargs, headers={'Accept': _JSON_CONTENT_TYPE})
      File "/home/hadoop/hue/desktop/core/src/desktop/lib/rest/resource.py", line 90, in get
        return self.invoke("GET", relpath, params, headers=headers, allow_redirects=True)
      File "/home/hadoop/hue/desktop/core/src/desktop/lib/rest/resource.py", line 73, in invoke
        urlencode=self._urlencode)
      File "/home/hadoop/hue/desktop/core/src/desktop/lib/rest/http_client.py", line 154, in execute
        raise self._exc_class(ex)
    RestException: HTTPConnectionPool(host='localhost', port=8088): Max retries exceeded with url: /ws/v1/cluster/apps?user=jondot&finalStatus=UNDEFINED (Caused by <class 'socket.error'>: [Errno 111] Connection refused)
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment