Mountain Lions and Elephants

Wednesday, January 1, 2014

IR_Black Theme in IDLE Shell

I prefer to use the IDLE Shell for interactive Python programming. I do not, however, particularly like the default themes. I created a theme based on the somewhat popular IR_Black theme. Here is how I set up IDLE:

In ~/.idlerc/config-main.cfg I added the following:

[Theme]

default = 0

name = IR_Black

In ~/.idlerc I created a file named config-highlight.cfg. In that file I added the following:

[IR_Black]

break-foreground = #E0E2E4

builtin-foreground = #95cbf9

comment-foreground = #66747B

console-foreground = #E0E2E4

cursor-foreground = #E0E2E4

definition-foreground = #95cbf9

error-foreground = #fc6863

hilite-foreground = #E0E2E4

hit-foreground = #E0E2E4

keyword-foreground = #93C763

normal-foreground = #E0E2E4

stderr-foreground = #fc6863

stdout-foreground = #95cbf9

string-foreground = #fda368

break-background = #242424

builtin-background = #242424

comment-background = #242424

console-background = #242424

definition-background = #242424

error-background = #242424

hilite-background = #36397e

hit-background = #7f397b

keyword-background = #242424

normal-background = #242424

stderr-background = #242424

stdout-background = #242424

string-background = #242424

Enjoy!

Accessing HBase using Thrift/Python

Install Thrift (and it's Dependencies)

To access HBase via Python you must first download, compile, and install Apache Thrift. As of the time of this writing, the Apache Thrift website does not offer pre-compiled binaries for OS X. The process for downloading, compiling and installing Thrift is documented on the Thrift web site. However, I have recorded my steps below incase someone is interested in following them.

Download and install libevent

Download libevent (here is the link for 2.0.12, which is the version I used)
Untar libevent
Compile libevent by executing the following three commands

./configure --prefix=/usr/local

make

sudo make install

Download and install Boost

Download Boost (here is the link for 1.54, which is the version I used)
Untar Boost
Compile Boost by executing the following two commands

./bootstrap.sh
sudo ./b2 threading=multi address-model=64 variant=release stage install

Note: I received numerous warnings and errors but ignored them.

Download and install Thrift

Download Thrift (here is the link for 0.9.1, which is the version I used)
Untar Thrift
Compile Thrift by exeucting the following command

./configure --prefix=/usr/local/ --with-boost=/usr/local --with-libevent=/usr/local

Monday, November 18, 2013

Starting and Stopping

Automatically Starting and Stopping Hadoop and HBase

After installing Hadoop and HBase on OS X, I noticed that data in HBase would get confused after waking up from sleep. Specifically, I was not be able to scan data in the an HBase table. I could successfully perform a list operation within the HBase shell (i.e., a list of all my tables would appear). However, if I tried an scan operation on any of the tables, I would see a host of stack traces at the console. I used the steps below to automate starting and stoping Hadoop and HBase to:

Save myself time running the start-xxx.sh commands
Prevent HBase from freaking out after sleep

I would like to preface this post with two important points:

I do not claim that the steps below fix the root cause of the HBase problem referenced above. However, the steps addressed the symptoms sufficiently to and allow my environment to retain data after waking up from sleep.
I am barely proficient in AppleScript. The scripts below are hacked together with the goal of addressing the problem mentioned above. The scripts are not very efficient or elegant. If you'd like to improve them, please post a comment!

The Third Party Software

To help me in quest to (a) automate starting and stopping my Hadoop and HBase services, I used a utility from Lagente Software called Scenario. Scenario is pay-for-use software. I am told that one can use a combination of built in OS X functionality and the free utility Sleepwatcher to implement the same effect. I could not, however, get Sleepwatcher to work so, after monkeying with Sleepwatcher for over an hour, I decided the $5 for Scenario was money well spent.

The Scripts

After installing Scenario, I authored four AppleScripts:

One for staring DFS, Yarn, and HBase immediate after logging in
One for stopping HBase, Yarn, and DFS just prior to logging out
One for stopping HBase prior to sleeping
One for starting HBase immediate after waking up from sleep

The Configuration

I "configured" each of the scripts to run at the appropriate time by placing them in folders as directed by the Scenario Programming Guide.

Saturday, November 2, 2013

Installing Hue

Installing Hue on OS X 10.8.5

For me, installing Hue on OS X was a pain in the ass. Unfortunately, I didn't meticulously record each step I executed to install Hue so I'm authoring this post from memory. Hopefully folks will find it useful.

Credits

First, I'd like to point out three sources that were valuable to me while performing the Hue install and configuration:
* Hue's GitHub page.
* The Hue section of the CDH4 Installation Guide.
* A stackoverflow post about Configuring Hue With CDH4.3.

Preparing for the Installation

Hue relies on several external software libraries. Even before I downloaded Hue, I prepared my environment by:

1. Installing JDK 1.7 (which is required to compile Hue)

2. Installed MacPorts

3. Installed easy_install

Once the above listed software (and there dependencies) was installed, I used MacPorts to obtain the following libraries:

liblxml

libxml2

libxslt

mysql55

sqlite3

I used easy_install to obtain the simplejson library.

Obtaining the Source Code

I downloaded the Hue source by issuing the following command at the command prompt:

git clone http://github.com/cloudera/hue.git

Modifying the Source

Depending on your configuration, this step may not be necessary. In my case, I did not establish an 'hdfs' user for installing Hadoop and owning the HDFS directory and files. If the owner of the HDFS root directory is someone other than 'hdfs', you must change the DEFAULT_HDFS_SUPERUSER in the hue/desktop/libs/hadoop/src/hadoop/fs/webhdfs.py file.

Building Hue from Source

Note: As mentioned above, Hue can only be build using JDK 1.7. If needed, be sure to update your JAVA_HOME to point to the appropriate directory. For me, I executed the following command prior to issuing the commands below: export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home

I built the Hue source by issuing the following commands at the command prompt:

cd hue

make apps

Note: I seem to recall I had a problem with the mysql55 library obtained from MacPorts. If I remember correctly, the make command was expected a MacPorts library (named mysql5_devel) that has since become obsoleted by mysql55. Unfortunately, I do not recall how I worked around this issue.

Configuring Hadoop

Add the following to hdfs-site.xml:

<name>dfs.webhdfs.enabled</name>

</property>

Add the following to core-site.xml:

<name>hadoop.proxyuser.hue.hosts</name>

</property>

<name>hadoop.proxyuser.hue.groups</name>

</property>

<name>hadoop.proxyuser.httpfs.hosts</name>

</property>

<name>hadoop.proxyuser.httpfs.groups</name>

</property>

Configuring Hue

Add the following to the [[hdfs_clusters]] section of the hue/desktop/conf/pseudo-distributed.ini:

webhdfs_url=http://localhost:50070/webhdfs/v1/

hadoop_hdfs_home=/Users/mkeating/Dev/fs_root/hadoop-hdfs

hadoop_bin=/Users/mkeating/Dev/hadoop-2.2.0/bin/hadoop

hadoop_conf_dir=/Users/mkeating/Dev/hadoop-2.2.0/etc/hadoop

Add the following to the [[yarn_clusters]] section of the hue/desktop/conf/pseudo-distributed.ini:

hadoop_mapred_home=/Users/mkeating/Dev/fs_root/hadoop-mapreduce

hadoop_bin=/Users/mkeating/Dev/hadoop-2.2.0/bin/hadoop

hadoop_conf_dir=/Users/mkeating/Dev/hadoop-2.2.0/etc/hadoop

Starting Hue

Start Hue by executing the following command at the command prompt:

hue/build/env/bin/hue runserver

Log into Hue by navigating to: http://localhost:8000

Note: When you login to Hue, you will see a number of error messages indicating problems with the current configuration. Most of the messages are likely legitimate. One message, however, can be misleading. If you get a message that looks like the one below, you should first test Hue's access to HDFS (through the File Browser) before troubleshooting.

Current value: http://localhost:50070/webhdfs/v1/
Filesystem root '/' should be owned by 'hdfs'

According to this post on Google Groups, this error message will always appear given our configuration (even if Hue is working properly).

Installing Hadoop and HBase

Installing Hadoop and HBase on OS X 10.8.5

These are the steps it used for installing Hadoop, HBase, and Pig on OS X 10.8.5. The versions of software I am using is:
* Hadoop 2.2.0
* HBase 0.96.0
* Pig 0.12.0

Credits

First, I'd like to credit a two sources that were invaluable for me while performing this install and configuration:
* Andy C's Installing Hadoop 2 on a Mac blog post.
* Freddy's Hadoop & Hbase on OSX 10.8 Mountain Lion blog post.

Get Software

Download the Hadoop 2.2.0 tarball from a mirror near you.

Download the HBase 0.96.0 tarball from a mirror near you.

Download the Pig 0.12.0 tarball from a mirror near you.

Install Software

Extract each of the above mentioned tarballs in a directory of your choice. I exploded them in /Users/mkeating/Dev.

Configure Environemnt

I prefer to use the .bash_profile file to maintain my settings between terminal sessions. I added the following to my .bash_profile.

export JAVA_HOME="$(/usr/libexec/java_home -v 1.6)"

export HADOOP_INSTALL="/Users/mkeating/Dev/hadoop-2.2.0"

export HBASE_HOME="/Users/mkeating/Dev/hbase-0.96.0-hadoop2"

export PIG_HOME="/Users/mkeating/Dev/pig-0.12.0"

export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin:
$HBASE_HOME/bin:$PIG_HOME/bin:$SUBLIME_TEXT_BIN

Create HDFS and MapReduce Directories

Create directories on your local file system to store HDFS and MapReduce data. I created the following two directories:
* /Users/mkeating/Dev/fs_root/hadoop-hdfs
* /Users/mkeating/Dev/fs_root/hadoop-mapreduce

Configure Hadoop

Add the following to core-site.xml:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Add the following to the bottom of hadoop-env.sh:

export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

Note: I'm no longer sure this is necessary or has any effect on the installation. At one point, I believed the addition of the above suppressed "Unable to load realm info from SCDynamicStore" errors.

Add the following to hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/Users/mkeating/Dev/fs_root/hadoop-hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/Users/mkeating/Dev/fs_root/hadoop-hdfs/datanode</value>
</property>
</configuration>

Add the following to the bottom of yarn-env.sh:

YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

Note: I'm no longer sure this is necessary or has any effect on the installation. At one point, I believed the addition of the above suppressed "Unable to load realm info from SCDynamicStore" errors.

Add the following to yarn-site.xml:

<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.nodemanager-aux-services</name>
<value>madpreduce.shuffle</value>
</property>
</configuration>

Start Hadoop

First format the namenode by executing the following command at the command prompt:

hadoop namenode -format

Next, start HDFS and Yarn by executing the following commands at the command prompt:

start-dfs.sh

start-yarn.sh

Test Hadoop

Test your Hadoop installation/configuration by executing the following at the command prompt:

jps

You should see output that looks something like this:

5053 Jps
2598 SecondaryNameNode
2416 NameNode
4933 Main
4915 HMaster
2704 ResourceManager
2498 DataNode

2789 NodeManager

Assuming no errors, you can run two more simple tests to check the installation. First, cd to the $HADOOP_INSTALL directory. Then copy a file to HDFS by executing the following commands.

hadoop fs -mkdir /user
hadoop fs -mkdir /user/<username> (where username is your logon ID)
hadoop fs -put LICENSE.txt
hadoop fs -ls

Finally try to run a MapReduce job by executing the following:

cd share/hadoop/mapreduce
hadoop jar ./hadoop-mapreduce-examples-2.2.0.jar wordcount LICENSE.txt out

Configure HBase

Add the following to hbase-env.sh:

export HBASE_OPTS="-Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

Add the following to hbase-site.xml:

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
</configuration>

Update the Hadoop JARs that come with HBase

HBase 0.96.0 bundles Hadoop 2.1.0-beta JAR files. To prevent errors, replace the 2.1.0-beta JARs in hbase-0.96.0-hadoop2/lib with the 2.2.0 JARs under the hadoop-2.2.0/share/hadoop directory.

Note: I could not find a replace for hadoop-client-2.1.0-beta.jar so I left it in the hbase-0.96.0-hadoop2/lib directory. So far, I have not run into any problems.

Start HBase

Start HBase by executing the following command at the command prompt:

start-hbase.sh

Test HBase

Test your HBase installation/configuration by launching the HBase shell and creating a table. First, type the following command at the command prompt to open the HBase shell.

hbase shell

Next, create a new table and insert a value by executing the following commands within the HBase shell:

create 'my_table', 'col_fam'
put 'my_table', 'row1', 'col_fam:a', 'value1'
put 'my_table', 'row1', 'col_fam:b', 'value2'
put 'my_table', 'row2', 'col_fam:a', 'value3'
scan 'my_table'
get 'my_table', 'row1'
disable 'my_table'
drop 'my_table'