Configure Hue with HTTPS / SSL

January 15, 2015, 6:26 pm

≫ Next: Automatic High Availability with Hue and Cloudera Manager

≪ Previous: Big Data Spain 2014: Big Data Web applications for Interactive Hadoop

SSL / HTTPS is often not simple. Here is some light in addition to the Cloudera Security guide that should help.

SSL between your browser and Hue

To configure Hue to use HTTPS we need a self signed SSL certificate that does not require a passphrase.

Here is how to generate a private key and a self-signed certificate for the Hue server:

openssl genrsa 4096 > server.key

openssl req -new -x509 -nodes -sha1 -key server.key > server.cert

Note: answer the questions that follow (complete example below). Entering the hostname for the server is important.

Note: you will have to tell your browser to “trust” the self signed server certificate

Then in the Hue configuration in CM or in the hue.ini:

Check Enable HTTPS
Enter path to server.cert in Local Path to SSL Certificate (ssl_certificate)
Enter path to server.key in Local Path to SSL Private Key (ssl_private_key)

Make sure Hue is setting the cookie as secure.

Note: when using a load balanced you might need to set in certain case secure_proxy_ssl_header.

Image may be NSFW.
Clik here to view.

Here is an example of creation of a certificate for enabling SSL:

[root@cehd1 hue]# pwd
/home/hue
[root@cehd1 hue]# ls
cacerts  cert  key

Generate a private key for the server:

[root@cehd1 hue]# openssl genrsa -out key/server.key 4096

Generate a “certificate request” for the server:

[root@cehd1 hue] openssl req -new -key key/server.key -out request/server.csr

You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank. For some fields there will be a default value, if you enter ‘.’, the field will be left blank.

Country Name (2 letter code) [XX]:US
 State or Province Name (full name) []:Colorado
 Locality Name (eg, city) [Default City]:Denver
 Organization Name (eg, company) [Default Company Ltd]:Cloudera
 Organizational Unit Name (eg, section) []:COE
 Common Name (eg, your name or your server's hostname) []:test.lab
 Email Address []:

Please enter the following 'extra' attributes to be sent with your certificate request
A challenge password []:  ## note this was left
An optional company name []:

Self-sign the request, creating a certificate for the server:

[root@cehd1 hue] openssl x509 -req -days 365 -in request/server.csr -signkey key/server.key -out cert/server.crt
 Signature ok
 subject=/C=US/ST=Colorado/L=<wbr />Denver/O=Cloudera/OU=COE/CN=test.lab
 Getting Private key

[root@cehd1 hue]# ls -lR
.
 total 16
 drwxr-xr-x 2 hue  root 4096 Jul 16 18:04 cacerts
 drwxr-xr-x 2 root root 4096 Jul 31 10:02 cert
 drwxr-xr-x 2 root root 4096 Jul 31 09:46 key
 drwxr-xr-x 2 root root 4096 Jul 31 10:00 request
 ./cacerts:
 total 4
 -rw-r--r-- 1 hue root 2036 Jul 16 18:04 win2k8x64-ad2-ca.pem
 ./cert:
 total 4
 -rw-r--r-- 1 root root 1907 Jul 31 10:02 server.crt
 ./key:
 total 4
 -rw-r--r-- 1 root root 3243 Jul 31 09:49 server.key
 ./request:
 total 4
 -rw-r--r-- 1 root root 1704 Jul 31 10:00 server.csr

SSL between Hue and the Hadoop components

The above was for having the Web browser use SSL when talking with Hue. In order to have Hue use SSL for talking to YARN, Hive, HDFS, … we need another property: REQUESTS_CA_BUNDLE as described in HUE-2082 (and sometimes more in the case of Hive for example).

I discovered that Hue’s truststore (the file pointed to by REQUESTS_CA_BUNDLE) has to contain the certificate not only of the NameNode, but of other nodes as well. I don’t know exactly which other nodes, but I suspect it’s every node that has a DataNode role. It’s easiest just to assume that the certs for all nodes need to be in the Hue truststore.

This is because we’re using self-signed test certs, not CA-signed certs. If we were using CA-signed certs, we could just put the CA cert chain in the Hue truststore.

Also, the Hue truststore has to be in PEM file format. At Cloudera we are using the JKS format for Hadoop SSL. So in order to populate the Hue truststore, you have to extract the certificates from the JKS keystores and convert them to PEM format. Here are the commands for doing that, given a JKS keystore called hadoop-server.keystore, on a host named foo-1.ent.cloudera.com:

keytool -exportcert -keystore hadoop-server.keystore -alias foo-1.cloudera.com \
        -storepass cloudera -file foo-1.cert
openssl x509 -inform der -in foo-1.cert > foo-1.pem

Once you’ve done this for each host in the cluster, you can concatenate the .pem files into one .pem file which can serve as the Hue truststore:

cat foo-1.pem foo-2.pem ... > huetrust.pem

After running it, set REQUESTS_CA_BUNDLE in the Hue environment safety valve to /etc/hadoop/ssl-conf/huetrust.pem

Image may be NSFW.
Clik here to view.

Here is an interesting link if you want to read more about generating SSL certificates.

As usual feel free to comment and send feedback on the hue-user list or @gethue!

↧

Automatic High Availability with Hue and Cloudera Manager

January 21, 2015, 9:42 am

≫ Next: Export and import your Search dashboards

≪ Previous: Configure Hue with HTTPS / SSL

By default, Hue installs on a single machine, which means Hue is both constrained to that machine’s CPU and memory, which can limit the total number of active users before Hue becomes unstable. Furthermore, even a lightly loaded machine could crash, which would bring Hue out of service. This tutorial demonstrates hue-lb-example, an example load balancer that can automatically configure NGINX and HAProxy for a Cloudera Manager-managed Hue.

Before we demonstrate its use, we need to install a couple things first.

Configuring Hue in Cloudera Manager

Hue should be set up on at least two of the nodes in Cloudera Manager and be configured to use a database like MySQL, PostgreSQL, or Oracle configured in a high availability manner. Furthermore, the database must be configured to be accessible from all the Hue instances. You can find detailed instructions on setting up or migrating the database from SQLite here.

Once the database has been set up, the following instructions describe setting up a fresh install. If you have an existing Hue, jump to step 5.

From Cloudera Manager
Go to “Add a Service -> Hue”, and follow the directions to create the first Hue instance.
Image may be NSFW.
Clik here to view.
Once complete, stop the Hue instance so we can change the underlying database.
Go to “Hue -> Configuration -> Database” and enter in the database connection information, and save.Image may be NSFW.
Clik here to view.
Go to “Hue -> Instances -> Add a Role Instance”Image may be NSFW.
Clik here to view.
Select “Hue” and select which services you would like to expose on Hue. If you are using Kerberos, make sure to also add a “Kerberos Ticket Renewer” on the same machine as this new Hue role.Image may be NSFW.
Clik here to view.
On “Customize Role Assignments”, add at least one other “Hue Server” instance another machine.
Start the new Hue Server.Image may be NSFW.
Clik here to view.

Installing the Dependencies

On a Redhat/Fedora-based system:

% sudo yum install git nginx haproxy python python-pip
% pip install virtualenv

On a Debian/Ubuntu-based system:

% sudo apt-get install git nginx haproxy python python-pip
% pip install virtualenv

Running the load balancers

First we want to start the load balancer:

% cd $HUE_HOME_DIR/tools/load-balancer

Next we install the load balancer specific dependencies in a python virtual environment to keep those dependencies from affecting other projects on the system.

% virtualenv build
% source build/bin/activate
% pip install -r requirements.txt

Finally, modify etc/hue-lb.toml to point at your instance of Cloudera Manager (as in “cloudera-manager.example.com” without the port or “http://”), and provide a username and password for an account that has read access to the Hue state.

Now we are ready to start the load balancers. Run:

% ./bin/supervisord
% ./bin/supervisorctl status
haproxy RUNNING pid 36920, uptime 0:00:01
monitor-hue-lb RUNNING pid 36919, uptime 0:00:01
nginx RUNNING pid 36921, uptime 0:00:01

You should be able to access Hue from either http://HUE-LB-HOSTNAME:8000 for NGINX, or http://HUE-LB-HOSTNAME:8001 for HAProxy. To demonstrate the that it’s load balancing:

Go into Cloudera Manager, then “Hue”, then “Instances”.
Stop the first Hue instance.
Access the URL and verify it works.
Start the first instance, and stop the second instance.
Access the URL and verify it works

Image may be NSFW.
Clik here to view.

Finally, if you want to shut down the load balancers, run:

% ./bin/supervisorctl shutdown

Automatic Updates from Cloudera Manager

The hue load balancer uses Supervisor, a service that monitors and controls other services. It can be configured to automatically restart services if they crashed, or trigger scripts if certain events occur. The load balancer starts and monitors the NGINX or HAProxy through another process named monitor-hue-lb. It accomplishes this through the use of Cloudera Manager API to access the status of Hue in Cloudera Manager, and automatically add and remove Hue from the load balancers. If it detects that a new Hue instances has been added or removed, it updates the configuration of all the active load balancers and triggers them to reload without dropping any connections.

Sticky Sessions

Both NGINX and HAProxy are configured to route users to the same backend, otherwise known as sticky sessions. This is both done for performance issues as it’s more likely the Hue backend will have the user’s data cached in the same Hue instance, but also because Impala currently does not yet support native high availability (IMPALA-1653). This means that the underlying Impala session opened by one Hue instance cannot be accessed by another Hue instance. By using sticky sessions, users will be always routed to the same Hue instance, so they will be able to still access their Impala sessions. That is, of course, assuming that Hue instance is still active. If not, the user will be routed to one of the other active Hue sessions.

Have any questions? Feel free to contact us on hue-user or @gethue!

↧

Export and import your Search dashboards

February 6, 2015, 9:56 am

≫ Next: Team retreat in the Phillipines

≪ Previous: Automatic High Availability with Hue and Cloudera Manager

There is no handy way to import and export your Search Dashboard until Hue 4 and HUE-1660, but here is a manual workaround:

./build/env/bin/hue dumpdata search --indent 2 < data.json

then

./build/env/bin/hue loaddata data.json

And that’s it, the dashboards with the same IDs will be refreshed with the imported ones!

Image may be NSFW.
Clik here to view.

Note:

If using CM, export this variable in order to point to the correct database:

HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/-hue-HUE_SERVER-id
echo $HUE_CONF_DIR
export HUE_CONF_DIR

Where <id> is the most recent ID in that process directory for hue-HUE_SERVER.

Have any questions? Feel free to contact us on hue-user or @gethue!

↧

Team retreat in the Phillipines

February 6, 2015, 10:48 am

≫ Next: Hue API: Execute some builtin commands

≪ Previous: Export and import your Search dashboards

Hello/Kumusta!

After Central America at the beginning of winter, it was again the perfect timing for one more exotic location: the Phillipines!

The team flew to the other side of the world on the island of Boracay for some great tricycle action, kitesurfing and discovering some new culture and food:

Onwards!!

Hue Team

Image may be NSFW.
Clik here to view. 2015-01-30 11.19.41

Image may be NSFW.
Clik here to view. 2015-01-28 17.18.08

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view. 2015-01-25 14.09.08

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view. Image may be NSFW.
Clik here to view.

↧

Hue API: Execute some builtin commands

February 7, 2015, 5:02 pm

≫ Next: Hue 3 on HDP installation tutorial

≪ Previous: Team retreat in the Phillipines

Hue comes with a set of commands for simplifying the management of the service. Here is a quick guide about how to use them.

Get started

If using CM, export this variable in order to point to the correct Hue:

cd /opt/cloudera/parcels/CDH/lib/

HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/-hue-HUE_SERVER-id
echo $HUE_CONF_DIR
export HUE_CONF_DIR

Where <id> is the most recent ID in that process directory for hue-HUE_SERVER.

If not using CM, just go in the root of Hue, normally:

cd /usr/lib/hue

Executing the hue command with no argument will list them all:

./build/env/bin/hue

...

[auth]
 changepassword
 createsuperuser

[beeswax]
 beeswax_install_examples
 close_queries
 close_sessions

[desktop]
 config_dump
 config_help
 config_upgrade
 create_desktop_app
 create_proxy_app
 create_test_fs
 kt_renewer
 runcherrypyserver
 runcpserver
 runpylint
 sync_documents
 test
 test_windmill
 version

[django]
 cleanup
 compilemessages
 createcachetable
 dbshell
 diffsettings
 dumpdata
 flush
 inspectdb
 loaddata
 makemessages
 reset
 runfcgi
 runserver
 shell
 sql
 sqlall
 sqlclear
 sqlcustom
 sqlflush
 sqlindexes
 sqlinitialdata
 sqlreset
 sqlsequencereset
 startapp
 startproject
 validate

[django_extensions]
 clean_pyc
 compile_pyc
 create_app
 create_command
 create_jobs
 describe_form
 dumpscript
 export_emails
 generate_secret_key
 graph_models
 mail_debug
 passwd
 print_user_for_session
 reset_db
 runjob
 runjobs
 runprofileserver
 runscript
 runserver_plus
 set_fake_emails
 set_fake_passwords
 shell_plus
 show_templatetags
 show_urls
 sqldiff
 sync_media_s3
 syncdata
 unreferenced_files

[django_openid_auth]
 openid_cleanup

[hbase]
 hbase_setup

[indexer]
 indexer_setup

[oozie]
 oozie_setup

[pig]
 pig_setup

[search]
 search_setup

[south]
 convert_to_south
 datamigration
 graphmigrations
 migrate
 migrationcheck
 schemamigration
 startmigration
 syncdb
 testserver

[spark]
 livy_server

[useradmin]
 import_ldap_group
 import_ldap_user
 sync_ldap_users_and_groups
 useradmin_sync_with_unix

Starting the server

For stating the test server, defaulting to port 8000:

./build/env/bin/hue runserver

For stating the production server, defaulting to port 8888:

./build/env/bin/hue runcpserver

These commands are more detailed on the How to get started page.

Installing the examples

All the commands finishing by ‘_setup’ will install the example of the particular app.

./build/env/bin/hue search_setup

In the case of Hive, in order to install the sample_07 and sample_08 tables and SQL queries, type:

./build/env/bin/hue beeswax_install_examples

Note:

These commands are also accessible directly from the Web UI.

Image may be NSFW.
Clik here to view.

Changing a password

This command is explained in more detail in the How to change or reset a forgotten password post:

./build/env/bin/hue changepassword

Closing Hive queries

This command is explained in more detail in the Hive and Impala queries life cycle post:

./build/env/bin/hue close_queries

./build/env/bin/hue close_sessions

Running the tests

This command is explained in more detail in the How to run the tests post:

./build/env/bin/hue test

Connect to the Database

This command is explained in more detail in the How to manage the database with the shell post:

./build/env/bin/hue test

Have any questions? Feel free to contact us on hue-user or @gethue!

↧

Hue 3 on HDP installation tutorial

February 12, 2015, 10:14 am

≫ Next: Fixing the YARN Invalid resource request, requested memory < 0, or requested memory > max configured

≪ Previous: Hue API: Execute some builtin commands

From Andrew Mo (mo@andrewmo.com)
Insight Data Science – Data Engineering Fellow

Last month I started a guest post on gethue.com demonstrating the steps required to use HUE 3.7+ with the Hortonworks Data Platform (HDP); I’ve used HUE successfully with HDP 2.1 and 2.2, and have created a step-by-step guide on using HUE 3.7.1 with HDP 2.2 below.

I’m participating the Insight Data Science Data Engineering Fellows program and built a real-time data engineering pipeline proof of concept using Apache Kafka, Storm, and Hadoop using a “Lambda Architecture.” Cloudera CDH and Cloudera Manager are great tools, but I wanted to use Apache Ambari to deploy and manage Kafka and Storm with Hadoop; for these reasons, HDP 2.2 was selected for the project (note from @gethue: in CDH, Kafka is available and Spark Streaming is preferred to Storm, and CM installs/configures all Hue automatically).

HUE is one of Hadoop’s most important projects, as it significantly increases a user’s ease of access to the power of the Hadoop platform. While Hive and YARN provide a processing backbone for data analysts familiar with SQL to use Hadoop, HUE provides my interface of choice for data analysts to quickly get connected with big data and Hadoop’s powerful tools.

With HDP, HUE’s features and ease of use are something I always miss, so I decided to add HUE 3.7.1 to my HDP clusters.

Features confirmed to work in partial or complete fashion:

• Hive/Beeswax
• File Browser
• HDFS FACL Manager
• HBase Cluster Browser
• Job Browser

Still working on debugging/integrating Pig/Oozie!

Spark is on my to do list as well.

Technical Details:

• Distribution: Hortonworks Data Platform (HDP) 2.2
• Cluster Manager: Apache Ambari 1.7
• Environment: Amazon EC2
• Operating System: Ubuntu 12.04 LTS (RHEL6/CentOS6 works fine as well)

HUE will be deployed as a “Gateway” access node to our Hadoop cluster; this means that none of the core Hadoop services or clients are required on the HUE host.

Image may be NSFW.
Clik here to view.

Installing HUE

Image may be NSFW.
Clik here to view.

For this walk-through, we’ll assume that you’ve already deployed a working cluster using Apache Ambari 1.7.

Let’s go on the HUE Host (Gateway node) and get started by preparing our environment and downloading the HUE 3.7.1 release tarball.

RHEL/CentOS uses ‘yum’ for package management.

Ubuntu uses ‘apt-get’ for package management. In our example, we’re using Ubuntu.

Prepare dependencies:

sudo apt-get install -y ant
sudo apt-get install -y gcc g++
sudo apt-get install -y libkrb5-dev libmysqlclient-dev
sudo apt-get install -y libssl-dev libsasl2-dev libsasl2-modules-gssapi-mit
sudo apt-get install -y libsqlite3-dev
sudo apt-get install -y libtidy-0.99-0 libxml2-dev libxslt-dev
sudo apt-get install -y maven
sudo apt-get install -y libldap2-dev
sudo apt-get install -y python-dev python-simplejson python-setuptools

Download HUE 3.7.1 release tarball:

• wget https://dl.dropboxusercontent.com/u/730827/hue/releases/3.7.1/hue-3.7.1.tgz

Make sure you have Java installed and configured correctly!
I’m using Open JDK 1.7 in this example:

sudo apt-get install -y openjdk-7-jre openjdk-7-jdk
sudo echo “JAVA_HOME=\”/usr/lib/jvm/java-7-openjdk-amd64/jre\”" >> /etc/environment

Unpackage the HUE 3.7.1 release tarball and change to the directory.

Install HUE:

sudo make install

By default, HUE installs to ‘/usr/local/hue’ in your Gateway node’s local filesystem.

As installed, the HUE installation folders and file ownership will be set to the ‘root’ user.

Let’s fix that so HUE can run correctly without root user permissions:

sudo chown -R ubuntu:ubuntu /usr/local/hue

Configuring Hadoop and HUE

HUE uses a configuration file to understand information about your Hadoop cluster and where to connect to. We’ll need to configure our Hadoop cluster to accept connections from HUE, and add our cluster information to the HUE configuration file.

Hadoop Configuration

Ambari provides a convenient single point of management for a Hadoop cluster and related services. We’ll need to reconfigure our HDFS, Hive (WebHcatalog), and Oozie services to take advantage of HUE’s features.

HDFS

We need to do three things, (1) ensure WebHDFS is enabled, (2) add ‘proxy’ user hosts and groups for HUE, and (3) enable HDFS file access control lists (FACLs) (optional).

Image may be NSFW.
Clik here to view. Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view. hue-hdp-6 Image may be NSFW.
Clik here to view.

Hive (WebHcat) and Oozie

We’ll also need to set up proxy user hosts and groups for HUE in our Hive and Oozie service configurations.

Image may be NSFW.
Clik here to view. Image may be NSFW.
Clik here to view.

Once these cluster configuration updates have been set, save, and restart these services on the respective cluster nodes.

Confirm WebHDFS is running:

Image may be NSFW.
Clik here to view.

HUE Configuration

The HUE configuration file can be found at ‘/usr/local/hue/desktop/conf/hue.ini’

Be sure to make a backup before editing!

Image may be NSFW.
Clik here to view.
We’ll need to populate ‘hue.ini’ with our cluster’s configuration information.

Examples are included below, but will vary with your cluster’s configuration.

In this example, the cluster is small, so our cluster NodeNode also happens to be the Hive Server, Hive Metastore, HBase Master, one of three Zookeepers, etc.

WebHDFS needs to point to our cluster NameNode:

Image may be NSFW.
Clik here to view.
Configure the correct values for our YARN cluster Resource Manager, Hive, Oozie, etc:

Image may be NSFW.
Clik here to view. Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.Image may be NSFW.
Clik here to view.Image may be NSFW.
Clik here to view.

To disable HUE ‘apps’ that aren’t necessary, or are unsupported, for our cluster, use the Desktop ‘app_blacklist’ property. Here I’m disabling the Impala and Sentry/Security tabs (note: the HDFS FACLs tab is disabled if the ‘Security’ app is disabled).

Start HUE on HDP

• We start the HUE server using the ‘supervisor’ command.

Image may be NSFW.
Clik here to view.
• Use the ‘-d’ switch to start the HUE supervisor in daemon mode
Connect to your new HUE server at its IP address/FQDN and the default port of ‘8888’

Image may be NSFW.
Clik here to view.

It works!

Congratulations, you’re running HUE 3.7.1 with HDP 2.2!

Let’s take a look around at HUE’s great features:

Image may be NSFW.
Clik here to view.

Have any questions? Feel free to contact Andrew or the hue-user list / @gethue!

↧

Fixing the YARN Invalid resource request, requested memory < 0, or requested memory > max configured

March 10, 2015, 3:05 pm

≫ Next: Export and import your Oozie workflows

≪ Previous: Hue 3 on HDP installation tutorial

Are you seeing this error when submitting a job to YARN? Are you launching an Oozie workflow with a Spark action? You might be hitting this issue!

Error starting action [spark-e27e]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory < 0, or requested memory > max configured, requestedMemory=1536, maxMemory=1024
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:203)
	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:377)
	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:320)
	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:574)
	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:213)
	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:403)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
]

Image may be NSFW.
Clik here to view.

Your job is asking for more memory than what YARN is authorizing him to do. One way to fix it is to up these parameters to more like 2000:

yarn.scheduler.maximum-allocation-mb

yarn.nodemanager.resource.memory-mb

Have any questions? Feel free to contact us on hue-user or @gethue!

↧

Export and import your Oozie workflows

March 11, 2015, 10:31 am

≫ Next: Solr Search UI – How to configure Hue with only the Search App

≪ Previous: Fixing the YARN Invalid resource request, requested memory < 0, or requested memory > max configured

There is no handy way to import and export your Oozie workflows until Hue 4 and HUE-1660, but here is a manual workaround possible since Hue 3.8/CDH5.4 and its new Oozie Editor.

The previous methods were very error prone as they required to insert data in multiple tables at the same time. Now, there is only one record by workflow.

Export all workflows

./build/env/bin/hue dumpdata desktop.Document2 --indent 2 --natural > data.json

Export specific workflows

20000013 is the id you can see in the URL of the dashboard.

./build/env/bin/hue dumpdata desktop.Document2 --indent 2 --pks=20000013 --natural > data.json

You can specify more than one id:

--pks=20000013,20000014,20000015

Load the workflows

Then

./build/env/bin/hue loaddata data.json

Refresh the documents

Until we hit Hue 4, this step is required in order to make the imported documents appear:

./build/env/bin/hue sync_documents

And that’s it, the dashboards with the same IDs will be refreshed with the imported ones!

Image may be NSFW.
Clik here to view.

Note:

If the document with the same id already exists in the database, just set its id to null in data.json and it will be inserted as a new document.

vim data.json

then change

"pk": 16,

"pk": null,

Note:

If using CM, export this variable in order to point to the correct database:

HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/-hue-HUE_SERVER-id
echo $HUE_CONF_DIR
export HUE_CONF_DIR

Where <id> is the most recent ID in that process directory for hue-HUE_SERVER.

Have any questions? Feel free to contact us on hue-user or @gethue!

↧

Solr Search UI – How to configure Hue with only the Search App

March 12, 2015, 9:49 am

≫ Next: Start developing Hue on a Mac in a few minutes!

≪ Previous: Export and import your Oozie workflows

The Solr Search App is having a great success and users often wonder if they could use it without the Hadoop related apps. As the app is only using the standard Apache Solr REST API and Hue allows to customize which apps to show, the answer is yes!

1. Install Hue from the links on the ‘Download’ section menu

Image may be NSFW.
Clik here to view.

2. Only enable the Solr Search App

In the hue.ini (See ‘Where is my hue.ini‘?), blacklist all the other apps:

[desktop]
  app_blacklist=beeswax,impala,security,filebrowser,jobbrowser,rdbms,jobsub,pig,hbase,sqoop,zookeeper,metastore,spark,oozie,indexer

Restart Hue and voila! Drag & drop widgets and build dynamic search dashboards in seconds!

Image may be NSFW.
Clik here to view.

Have any questions? Feel free to contact us on hue-user or @gethue!

Note
If you want to install the examples you could enable the indexer

indexer

Note
The app is primarily tested on Solr Cloud mode but works on regular Solr

↧

Start developing Hue on a Mac in a few minutes!

March 23, 2015, 1:06 pm

≫ Next: HBase Browsing with doAs impersonation and Kerberos

≪ Previous: Solr Search UI – How to configure Hue with only the Search App

You might have already all the pre-requisites installed but we are going to show how to start from a fresh Yosemite (10.10) install and end up with running Hue on your Mac in almost no time!

Image may be NSFW.
Clik here to view.

We are going to be using the official Quickstart VM from Cloudera that already packs all the Hadoop ecosystem components your Hue will talk to. If you don’t have the latest already downloaded and running, please visit this link and choose the versions that suits you the best.

In the meanwhile, let’s set up your Mac!

Step 1: Clone the Hue repository
To clone the Hue Github repository you need git installed on your system. Git (plus a ton of other tools) is included in the Xcode command line tools. To install it open Terminal and type

xcode-select --install

In the dialog choose “Install”. If on Terminal you have the message “xcode-select: error: command line tools are already installed, use “Software Update” to install updates” it means you are almost good to go already.

From Terminal, navigate to a directory where you keep all your project and run

git clone https://github.com/cloudera/hue.git

You now have the Hue source code in your Mac.

Step 2: Install Java
The build process use Java to run. A quick way to get to the right download URL from Oracle is to run from Terminal

java -version

and then click on the “More info” button on the dialog that appears. On Oracle’s website, accept the license and choose the Mac OS X JDK link. After the DMG has been downloaded, open it and double click on the installation package. Now, if we return to the Terminal and type again

java -version

we will have the version of the freshly installed JDK. At the time of writing, 1.8.0_40.

Step 3: Install the pre-requisites
Hue uses several libraries that are not included in the XCode command line tools so we will need to install that too. To do that we will use Homebrew, the fantastic open source package manager for Mac OS X. Install it from Terminal with

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

You will need to enter your password to continue. Then, as suggested by the installation script, run

brew doctor

If you already have Homebrew installed, just update it running

brew update

As a first thing, we need to install Maven 3

brew install maven

And then Mysql to have the development libraries for it

brew install mysql

Step 4: Compile and configure Hue
Now that we are all set with the requirements we can compile Hue by running

make apps

from the Hue folder that was created by the git clone in step 1. After a while, if everything goes as planned, you should see as a last build message something like “N static files copied to …”.

Image may be NSFW.
Clik here to view.

Hue comes with a default configuration file that points all the service to the local machine. Since we are using a VM for this purposes, we will need to change several conf lines. For your convenience, we have the file readily available here.

Just copy this file over to your hue/desktop/conf folder!

Step 5: Configure your /etc/hosts
The last thing we should do is to start the Quickstart VM and get its IP address

Image may be NSFW.
Clik here to view.

(you can launch the terminal inside the VM and run ‘ifconfig’ for that; in my case it’s 172.16.156.130). Then, on your Mac, edit the hosts file with

sudo vi /etc/hosts

and add the line

172.16.156.130 quickstart.cloudera

with the IP you got from the VM. Save and you are good to go!

Step 6: Run!
What you have to do on Terminal from the Hue folder is just

./build/env/hue runserver

And point your browser to http://localhost:8000! Go and write a new app now! Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

As usual feel free to comment on the hue-user list or @gethue!

↧

HBase Browsing with doAs impersonation and Kerberos

March 25, 2015, 9:22 am

≫ Next: Add a top banner to Hue!

≪ Previous: Start developing Hue on a Mac in a few minutes!

Hue comes with an HBase App that lets you create table, search for rows, read cell content… in just a few clicks. We are now glad to release the last missing piece of security (available in the upcoming Hue 3.8) for making the app production ready!

The HBase app talks to HBase through a proxy server (called Thrift Server V1) which forwards the commands to HBase. Because Hue stands between the proxy server and the actual user, the proxy server thinks that all the operations (e.g. create a table, scan some data…) are coming from the ‘hue’ user and not the actual Web user. This is obviously not very secure!

In order to secure the HBase app for real we need to:

make sure that the actual logged in user in Hue performs the operations with his privileges. This is the job of Impersonation.
make sure that the Hue server only sends these calls. This is the job of Kerberos strong authentication.

Note

We assume that you have installed an HBase Thrift Server in your cluster. If using Cloudera Manager, go to the list of instances of the HBase service and click on ‘Add Role Instances’ and select ‘HBase Thrift Server’.

Impersonation

HBase can now be configured to offer impersonation (with or without Kerberos). In our case this means that users can send commands to HBase through Hue without losing the fact that they will be ran under their own credentials (instead of the ‘hue’ user).

First, make sure you have this in your hbase-site.xml:

<property>
  <name>hbase.thrift.support.proxyuser</name>
  <value>true</value>
</property>

<property>
  <name>hbase.regionserver.thrift.http</name>
  <value>true</value>
</property>

Note

If using Cloudera Manager, this is done by typing ‘thrift’ in the configuration search of the HBase service and checking the first two results.

Then check in core-site.xml that HBase is authorized to impersonates someone:

<property>
  <name>hadoop.proxyuser.hbase.hosts</name>
  <value>*</value>
</property>

<property>
  <name>hadoop.proxyuser.hbase.groups</name>
  <value>*</value>
</property>

And finally check that Hue point to a local config directory of HBase specified in its hue.ini:

[hbase]
hbase_conf_dir=/etc/hbase/conf

Note

If you are using Cloudera Manager, you might want to select the HBase Thrift server in the Hue configuration and enter something like this in the Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini.

[hbase]
hbase_conf_dir={{HBASE_CONF_DIR}}

And that’s it, start the HBase Thrift Server and Hue and you are ready to go!

Security with Kerberos

Now that the Hue can send commands to the HBase Thrift Server and tell him to execute them as a certain user, we need to make sure that only Hue is allowed to do this. We are using Kerberos in order to strongly authenticate the users to the HBase service. In our case, the HBase Thrift server will accept commands only if they come from the Hue user only.

Make sure that HBase is configured with Kerberos and that you have this in the hbase-site.xml pointed by Hue:

<property>
  <name>hbase.security.authentication</name>
  <value>KERBEROS</value>
</property>

<property>
  <name>hbase.thrift.kerberos.principal</name>
  <value>hbase/_HOST@ENT.CLOUDERA.COM</value>
</property>

Note

If using Cloudera Manager or regular Thrift without impersonation, make sure to set the “HBase Thrift Authentication” hbase.thrift.security.qop must be set to one of the following:

auth-conf: authentication, integrity and confidentiality checking
auth-int: authentication and integrity checking
auth: authentication only

If using Cloudera Manager, go to “Hbase service > Configuration > Service-Wide / Security : HBase Thrift Authentication ” and select one of the following three options.

And similarly to above, make sure that the hue.ini points to a valid directory with hbase-site.xml:

[hbase]
hbase_conf_dir=/etc/hbase/conf

[hbase]
hbase_conf_dir={{HBASE_CONF_DIR}}

Restart HBase and Hue, and they should be all secured now!

Conclusion

You can now be sure that Hue users can only see or modify what they are allowed to at the HBase level. Hue guarantees that if a user cannot perform a certain operation in the HBase shell, it will exactly the same through Hue (Hue acts like a ‘view’ on top of HBase).

Note that HBase chose to support impersonation through HTTP Thrift, so regular Thrift won’t work when using impersonation. The previous Kerberos support also now makes sense since all the operations are not seeing as coming from the Hue user anymore! More work is on the way to make all these configuration steps only one click.

Now it is time to play with the table examples and open-up HBase to all your users!

Image may be NSFW.
Clik here to view.

As usual feel free to comment on the hue-user list or @gethue!

Note

This error means that the above ‘hadoop.proxyuser.hbase.hosts’ / ‘hadoop.proxyuser.hbase.groups’ properties are not correct:

Api Error: Error 500 User: hbase is not allowed to impersonate romain HTTP ERROR 500 Problem accessing /. 
Reason: User: hbase is not allowed to impersonate bob Caused by:javax.servlet.ServletException: 
User: hbase is not allowed to impersonate bob at org.apache.hadoop.hbase.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:117) at

Note

You might now see permission errors like below.

Api Error: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=admin, scope=default, action=CREATE)...

This is because either:

you are using impersonation and your user ‘bob’ does not have enough HBase privileges
you are not using impersonation and the ‘hue’ user does not have enough HBase privileges

A quick way to fix this is to just give all the permissions. Obviously this is not recommended for a real setup, instead read more about HBase Access Control!

sudo -u hbase hbase shell 

hbase(main):004:0> grant 'bob', 'RWC'

Note

If you are getting a “Api Error: TSocket read 0 bytes”, this is because Hue does not know that the Thrift Server is expecting Thrift HTTP. Double check that Hue points to an hbase-site.xml that contains the hbase.regionserver.thrift.http property set to true.

A temporary hack would be to insert this in the hue.ini:

[hbase]
use_doas=true

Note

“Api Error: maximum recursion depth exceeded” means that the HBase Thrift server is not running as an HTTP Kerberos service.

In the latest Hue 3.8 you should now just get a 401 error instead.

Note

buffered transport mode was not tested when using impersonation but might work.

↧

Add a top banner to Hue!

March 26, 2015, 10:05 am

≫ Next: Using NGINX to speed up Hue

≪ Previous: HBase Browsing with doAs impersonation and Kerberos

We have already seen in this post how you can configure Hue in your cluster. But did you know that there’s a property that can make a top banner appear in your Hue installation? Image may be NSFW.
Clik here to view. This is quite useful if you want for instance to show a disclaimer to your users, or to clearly mark a testing or production environment, or if you want to display some dynamic information there. Depending on if you are using Cloudera Manager or not, you should either add a safety valve or edit a .ini file to use this feature. For details on how to change the configuration, read here. In the desktop/custom section of the ini file you can find the banner_top_html property:

[desktop]
[[custom]]
# Top banner HTML code
banner_top_html=

Then it’s just a matter of writing some HTML/CSS and even Javascript code to customized it as you prefer. Keep in mind that you have a limited height to do that (30px). For instance, to write a white message on an orange background like the image at the beginning of this post, you can write this:

[desktop]
[[custom]]
# Top banner HTML code
banner_top_html='</pre>
<div><i class="&quot;fa"></i> This is the test environment for Acme, Inc. - For any problem <a href="&quot;mailto:roadrunner@acme.com&quot;">roadrunner@acme.com</a></div>
<pre>'

Or we could even use a very old HTML tag to display a running ticker!

[desktop]
[[custom]]
# Top banner HTML code
banner_top_html='Welcome to the test environment.'

Image may be NSFW.
Clik here to view. And what about displaying some real time information on top of Hue? As an example, we are going to update the banner with the latest ISS position at every page change thanks to Open Notify.

[desktop]
[[custom]]
# Top banner HTML code
banner_top_html='<script>// <![CDATA[
$(document).ready(function(){ $.getJSON(&quot;http://api.open-notify.org/iss-now.json?callback=?&quot;, function(data){ $(&quot;#isspos&quot;).html(&quot;LAT: &quot;+data.iss_position.latitude+&quot;, LNG: &quot;+data.iss_position.longitude); }); })
// ]]></script></pre>
<div>The current ISS position is <span id="&quot;isspos&quot;"></span></div>
<pre>'

Image may be NSFW.
Clik here to view. Pretty cool, uh? Now it’s your turn to create something useful with it! As usual feel free to comment on the hue-user list or @gethue!

↧

Using NGINX to speed up Hue

March 26, 2015, 11:55 am

≫ Next: More Solr Search dashboards possibilities

≪ Previous: Add a top banner to Hue!

Need for Speed!

In the soon to be released Hue 3.8, we have added the ability to serve Hue’s static files (images, JavaScript, and etc) with an external server like NGINX. This allows us to dramatically cut down on the number of files served by the Hue application, making the whole user experience dramatically faster.

For example, in the old version of Hue, rendering the beeswax application on demo.gethue.com performs 73 requests in 3 seconds to download 2.5MB of data.

Image may be NSFW.
Clik here to view.

In comparison, in Hue 3.8 behind NGINX, rendering that same page performs 5 requests for 130KB in 0.7 seconds.

Image may be NSFW.
Clik here to view.

Configuring NGINX

The simplest option is to just follow the instructions described in Automatic High Availability with Hue and Cloudera Manager, which we’ve updated to support this optimization. Or if you want to just set up a simple NGINX configuration, you can install NGINX on Redhat systems with:

% yum install nginx

Or on a Debian/Ubuntu system with:

% apt-get install nginx

Next, add a /etc/nginx/conf.d/hue.conf file with the following contents. Make sure to tweak server_name to this machine’s hostname (or just localhost), the alias to point at Hue’s static files, and the server to point at the Hue instance. Note that if you’re running multiple Hue instances, be sure to use a database like MySQL, PostgreSQL, or Oracle which allows for remote access:

server {
    server_name NGINX_HOSTNAME;
    charset utf-8;

    listen 8001;

    # Or if running hue on https://
    ## listen 8001 ssl;
    ## ssl_certificate /path/to/ssl/cert;
    ## ssl_certificate_key /path/to/ssl/key;

    location / {
        proxy_pass http://hue;

        # Or if the upstream Hue instances are running behind https://
        ## proxy_pass https://hue;
    }

    location /static/ {
        # Uncomment to expose the static file directories.
        ## autoindex on;

        # If Hue was installed with packaging install:
        alias /usr/lib/hue/build/static/;

        # Or if on a parcel install:
        ## /opt/cloudera/parcels/CDH/lib/hue/build/static/;

        expires 30d;
        add_header Cache-Control public;
    }
}

upstream hue {
    ip_hash;

    # List all the Hue instances here for high availability.
    server HUE_HOST1:8888 max_fails=3;
    server HUE_HOST2:8888 max_fails=3;
    ...
}

Finally, start NGINX with sudo service nginx start and navigate to http://NGINX_HOSTNAME:8001.

As usual feel free to comment on the hue-user list or @gethue!

↧

More Solr Search dashboards possibilities

March 30, 2015, 10:22 am

≫ Next: Hueber is launching, just tap for getting your Big Data Cloud!

≪ Previous: Using NGINX to speed up Hue

The Search dashboards got a series of new options and long awaited features in Hue 3.8. Here is a summary of the major improvements.

Regular users can now also create dashboards

Previously, only Hue admin could access the editor, which was not very practical.

Image may be NSFW.
Clik here to view.

Range & Up facet

Interval facets on any type of data has been supported since the first versions. However, some use cases would benefit more of range facets with one upper or lower bound open. Think like getting all the logs younger than 1 day or restaurants with ratings above 4 stars.

Image may be NSFW.
Clik here to view.

2D maps

The gradient map is handy for displaying the traffic by location. It now supports another dimension that way you can plot the Top Browsers, Operating Systems by country

Image may be NSFW.
Clik here to view.

Multiple widgets using the same fields

This feature is particularly useful for using a country code code field with several widgets or a date fields for a timeline and also a text facet. Previously each field could be used only once in a widget!

Image may be NSFW.
Clik here to view.

Collection aliases

All the aliased group of collections will now appear in the list of available collections. So just pick the name like any other collection. The UI also hides the core list by default to save some space.

Image may be NSFW.
Clik here to view.

Enable only the Search app

Hue only uses the standard Solr API. This means that any Solr or Solr Cloud setup can also benefit from the dashboard UI. Here is how to customize Hue to only show the Search app and get started in a few clicks!

Image may be NSFW.
Clik here to view.

Export and import dashboard

Until we get the builtin support for exporting/importing any Hue documents, here is a new way to backup or move your existing dashboard to other installations.

Image may be NSFW.
Clik here to view.

Next!

A lot more is coming up, with a Date Widget for easily setting up a rolling timeline, more statistics and analytics facets!

Also in the pipeline is a revamp of the indexer designer for making collection index creation a 3-click operation.

Happy Searching!

As usual feel free to comment on the hue-user list or @gethue!

↧

Hueber is launching, just tap for getting your Big Data Cloud!

April 1, 2015, 5:54 am

≫ Next: New Apache Oozie Workflow, Coordinator & Bundle Editors

≪ Previous: More Solr Search dashboards possibilities

Our society is becoming more and more ubiquitous. People want instant access to anything, ranging from cars, food, houses… to news, files, messages, photos and even videos.

Cloud Computing and Big Data are also becoming mainstream, everybody has heard of it and wants to use it.

A big potential is there…

… and today we have merged the power of Cloud computing with the Mobile Sharing economy to create…

Image may be NSFW.
Clik here to view. huerber-logo

“Hueber”! Just tap your phone to get Hadoop in the Cloud!

Do you need some Computers?

Just open the Hueber app on your phone and send a request:

Image may be NSFW.
Clik here to view.

Want them in other countries?

Just zoom out a bit!

Image may be NSFW.
Clik here to view.

Want a Cloud in another continent?

Zoom even more!

Image may be NSFW.
Clik here to view.

The service matches you with any available Cloud resource…

Image may be NSFW.
Clik here to view.

… or works even in places with complicated names

Image may be NSFW.
Clik here to view.

It is now easy to put all your Big Data in a Cloud!

Later, when you are analysing your data, in case you need help for a SQL query?

Just request a Data Analyst close to you for 5 minutes.

Image may be NSFW.
Clik here to view.

Want a pretty graph for a report?

Get someone with DataViz skills! Just beware, it will cost you 2x more!

Image may be NSFW.
Clik here to view.

Give Hueber a try, it is free and open source, and you are welcome to send feedback!

Go to Hueber!

↧

New Apache Oozie Workflow, Coordinator & Bundle Editors

April 2, 2015, 9:39 am

≫ Next: Developer Guide on Upgrading Apps for Hue 3.8

≪ Previous: Hueber is launching, just tap for getting your Big Data Cloud!

Oozie is one of the initial major first app in Hue. We are continuously investing in making it better and just did a major jump in its editor.

This revamp of the Oozie Editor brings a new look and requires much less knowledge of Oozie! Workflows now support tens of new functionalities and require just a few clicks to be set up!

The files used in the videos comes with the Oozie Examples.

In the new interface, only the most important properties of an action are asked to be filled, and quick-links for verifying path and other jobs are offered. Hive and Pig script files are parsed in order to extract the parameters and directly propose them with autocomplete. The advanced functionalities of an action are available in a new kind of popup with much less frictions, as it just overlaps with the current node.

Image may be NSFW.
Clik here to view.

New Editor

Image may be NSFW.
Clik here to view.

New Editor (edit mode)

Image may be NSFW.
Clik here to view.

Old Editor

Two new actions have been added:

HiveServer2
Spark

Image may be NSFW.
Clik here to view.

And the user experience o Pig and Sub-workflows is simplified.

Decision node support has been improved, copying an existing action is also now just a way of drag & dropping. Some layout are now possible as the ‘ok’ and ‘end’ nodes can be individually changed.

Image may be NSFW.
Clik here to view.

Coordinators have been vastly improved! The notion of Oozie datasets is not needed anymore. The editor pulls the parameters of your workflow and offers 3 types of inputs:

parameters: constant or Oozie EL function like time
input path: parameterize an input path dependency and wait for it to exist, e.g.
output path: like an input path but does not need to exist for starting the job

Image may be NSFW.
Clik here to view.

The dreaded UTC time zone format is not directly provided either by the calendar or with some helper widgets.

Image may be NSFW.
Clik here to view.

Sum-up

In addition to providing a friendlier end user experience, this new architecture opens up for innovations.

First, it makes it easy to add new Oozie actions in the editor. But most importantly, workflows are persisted using the new Hue document model, meaning their import/export is simplified and will be soon available directly from the UI. This model also enables the future generation of your workflows by just drag & dropping saved Hive, Pig, Spark jobs directly in the workflow. No need to manually duplicate your queries on HDFS!

This also opens the door of one click scheduling of any jobs saved in Hue as the coordinators are much simpler to use now. While we are continuing to polish the new editor, the Dashboard section of the app will see a major revamp next!

As usual feel free to comment on the hue-user list or @gethue!

Note

Old workflows are not automatically convert to the new format. Hue will try to import them for you, and open them in the old editor in case of problems.

Image may be NSFW.
Clik here to view.

A new export and / export is planned for Hue 4. It will let you export workflows in both XML / JSON Hue format and import from Hue’s format.

↧

Developer Guide on Upgrading Apps for Hue 3.8

April 8, 2015, 9:38 am

≫ Next: Hive 1.1 and Impala 2.2 support

≪ Previous: New Apache Oozie Workflow, Coordinator & Bundle Editors

The upcoming Hue 3.8 internals have gone through some major upgrades to improve performance, robustness, and security. The major change stems from upgrading Django from 1.4.5 to 1.6.10, which comes with a significant amount performance enhancements, bug fixes, and the removal of deprecated features.

This post details how Hue developers that are building against the Hue SDK can upgrade their applications to work with Hue 3.8.

Image may be NSFW.
Clik here to view.

Python version

Python 2.6.5 is now the minimum version, 2.6.0 is not enough anymore.

Django Upgrade

Hue was upgraded from Django 1.4.5 to Django 1.6.10. While the Django release notes for 1.5 and 1.6 go into extensive detail on how to upgrade, here are the main issues we encountered while upgrading Hue.

Json

We backported Django 1.7’s JsonResponse to simplify responding with Json records. So views that used to be written as:

def view(request):
    value = { “x”: “y” }
    HttpResponse(json.dumps(value))

Can now be written as:

def view(request):
    value = { “x”: “y” }
    return JsonResponse(value)

One thing to note though is that Django now by default will raise an error if a non-dictionary is serialized. This is to prevent against attack against older browsers. Here is how to disable this error:

def view(request):
    value = [“x”, “y”]
    return JsonResponse(value, safe=False)

We recommend that developers migrate over to returning objects. Hue itself should be completely transitioned by 3.8.0.

Urls and Reverse

Django’s django.core.urlresolvers.reverse (and therefore the url function in mako scripts) now automatically escapes arguments. So so uses of these functions should be changed from:

<a href="${ url('useradmin.views.edit_user', username=urllib.quote(user.username)) }">...</a>

To:

<a href="${ url('useradmin.views.edit_user', username=user.username) }">...</a>

StreamingHttpResponse

In order to return a generator from a view, it is now required to use StreamingHttpResponse. When testing, change code from

 csv_response = self.c.post(reverse('search:download'), {
         'csv': True,
         'collection': json.dumps(self._get_collection_param(self.collection)),
         'query': json.dumps(QUERY)
 })
csv_response_content = csv_response.content

To:

csv_response = self.c.post(reverse('search:download'), {
        'csv': True,
        'collection': json.dumps(self._get_collection_param(self.collection)),
        'query': json.dumps(QUERY)
 })
csv_response_content = ''.join(csv_response.streaming_content)

Static Files

As described in NGINX-post, Hue now can serve serves static files with a separate webserver like NGINX. This can substantially cut down the number of requests that the Hue frontend needs to perform in order to render a page:

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

This change will break applications using the old way of serving static files. It will also cause conflicts to any user back porting patches that touch static files from Hue 3.8.0 and above to older versions of Hue.

In order to make the transition, do:

Move static files from <code>/apps/$name/static</code> to <code>/apps/$name/src/$name/static</code>

Update <code>.mako</code> files to change from:

<link rel=”stylesheet” href=”/metastore/static/css/metastore.css”>

To:

<link rel=”stylesheet” href=”${ static(‘metastore/css/metastore.css’) }”>

Update the “ICON” in apps/$name/src/help/settings.py from:

ICON = “/help/static/art/icon_help_24.png”

To:

ICON = “help/art/icon_help_24.png”

Update any Python python templates from:

def view(request):
    data = {‘image’: “/help/static/art/icon_help_24.png”}
    return render(“template.mako”, request, data)

To:

from django.contrib.staticfiles.storage import staticfiles_storage

…

def view(request):
    data = {‘image’: staticfiles_storage.url(“/help/static/art/icon_help_24.png”) }
    return render(“template.mako”, request, data)

Finally, in order to run Hue with debug=False, it is now required to first run either make apps or ./build/env/bin/hue collectstatic in order to gather all the files into the build/static directory. This is not necessary for debug=True, where hue will serve the static files directly from the /apps/$name/src/$name/static directory.

Next!

Django 1.8 was released this month! This is the second LTS release and 1.4 support will drop in 6 months. The major dependencies of 1.8 is that it would require Python 2.7, which is not the default Python version on older LTS OS still used nowadays.

As usual feel free to comment on the hue-user list or @gethue!

↧

Hive 1.1 and Impala 2.2 support

April 10, 2015, 8:29 am

≫ Next: Oozie Dashboard Improvements

≪ Previous: Developer Guide on Upgrading Apps for Hue 3.8

Hive did a big jump by finally graduating to its 1.0 version version. It is even 1.1 now (equivalent to 0.14). Hue’s Hive and Impala Editor have been updated to take advantages of a series of their new features.

Image may be NSFW.
Clik here to view.

This release finally unifies the HiveServer2 API. All its API calls (e.g. getting the logs of a query) now belong to the upstream version. This makes Hue 100% compatible with any Hive 1.1+ version going forward and will solve a lot of integration headaches.

Another advantage is the support of the new columnar format which makes the fetching of result set data much faster.

If you are looking at the SSL configuration, check this previous blog post.

One more feature landing in Hue 3.8 that could interest some users is the Thrift HTTP support. We got this feature in for improving the interaction of the HBase App but can re-use it for free for HiveServer2.

By configure HiveServer2 in HTTP mode:

<property>
  <name>hive.server2.transport.mode</name>
  <value>http</value>
</property>

Hue will automatically pick it up if it points to a good hive-site.xml.

Anothe feature is the development of a Notebook UI (currently a beta version) that let’s you type SQL. You can know do quick prototyping and graphing!

Image may be NSFW.
Clik here to view.

Next!

Coming up next are the support of HiveServer2 High Availability (HA) in order to support transparently rolling upgrades or server crashes. The new Notebook App is in heavy development and will share the same UI for the SQL editors.

More user friendliness is also on the way, with a visual display of table statistics and the autocomplete of nested types!

As usual feel free to comment on the hue-user list or @gethue!

↧

Oozie Dashboard Improvements

April 10, 2015, 3:07 pm

≫ Next: Beta of new Notebook Application for Spark & SQL

≪ Previous: Hive 1.1 and Impala 2.2 support

In the upcoming Hue 3.8, Oozie Dashboard just got several improvements making them and their navigation even more intuitive (for the Editor revamp, see this). Here is a video demo that sums them up:

New Oozie features

In Workflow dashboard:

Job parent column (parent can be nothing or a workflow or a coordinator)
Job parent “Submitted by” filter

Image may be NSFW.
Clik here to view.

Navigate to Sub-Workflow action and editor pages from a submitted workflow graph

Image may be NSFW.
Clik here to view.

Navigate to parent Job from an action/workflow in a submitted Workflow

Image may be NSFW.
Clik here to view.

Update end time of running Coordinator

Image may be NSFW.
Clik here to view.

Next !

A lot more is coming up:

Faster log retrieval
Live graph display of any running Workflow
Smarter file symlinking in Workflow action
Coordinator actions pagination

and rebasing the workflow dashboard on the editor is being evaluated. Stay tuned!

As usual feel free to comment on the hue-user list or @gethue!

↧

Beta of new Notebook Application for Spark & SQL

April 23, 2015, 6:25 am

≫ Next: Hue 3.8 with an Oozie Editor revamp, better performances & improved Spark UI is out!

≪ Previous: Oozie Dashboard Improvements

Last year we released Spark Igniter to enable developers to submit Spark jobs through a Web Interface. While this approach worked, the UX left a lot to be desired. Programs had to implement an interface, be compiled beforehand and YARN support was lacking. We also wanted to add support for Python and Scala, focusing on delivering an interactive and iterative programming experience similar to using a REPL.

Image may be NSFW.
Clik here to view.

This is for this that we started developing a new Spark REST Job Server that could provide these missing functionalities. On top of it, we revamped the UI for providing a Python Notebook-like feeling.

Note that this new application is pretty new and labeled as ‘Beta’. This means we recommend trying it out and contributing, but its usage is not officially supported yet as the UX is going to evolve a lot!

This post describes the Web Application part. We are using Spark 1.3 and Hue master branch.

Based on a new:

Spark REST Job Server
Notebook Web UI

Supports:

Scala
Python
Java
SQL
YARN

If the Spark app is not visible in the ‘Editor’ menu, you will need to unblacklist it from the hue.ini:

[desktop]
app_blacklist=

On the same machine as Hue, go in the Hue home:

If using the package installed:

cd /usr/lib/hue

If using Cloudera Manager:

cd /opt/cloudera/parcels/CDH/lib/
HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/-hue-HUE_SERVER-id
echo $HUE_CONF_DIR
export HUE_CONF_DIR

And start the Spark Job Server:

./build/env/bin/hue livy_server

You can customize the setup by modifying these properties in the hue.ini:

[spark]
# URL of the REST Spark Job Server.
server_url=http://localhost:8090/

# List of available types of snippets
languages='[{"name": "Scala", "type": "scala"},{"name": "Python", "type": "python"},{"name": "Impala SQL", "type": "impala"},{"name": "Hive SQL", "type": "hive"},{"name": "Text", "type": "text"}]'

# Uncomment to use the YARN mode
## livy_server_session_kind=yarn

Next

This Beta version brings a good set of features, a lot more is on the way. In the long term, we expect all the query editors (e.g. Pig, DBquery, Phoenix…) to use this common interface. Later, individual snippets could be drag & dropped for making visual dashboards, notebooks could be embedded like in Dropbox or Google docs.

We are also interested in getting feedback on the new Spark REST Job Server and see what the community thinks about it (contributions are welcomed ;).

As usual feel free to comment on the hue-user list or @gethue!

↧