Saturday, November 2, 2013

Installing Hadoop and HBase

Installing Hadoop and HBase on OS X 10.8.5

These are the steps it used for installing Hadoop, HBase, and Pig on OS X 10.8.5.  The versions of software I am using is:
    * Hadoop 2.2.0
    * HBase 0.96.0
    * Pig 0.12.0

Credits

First, I'd like to credit a two sources that were invaluable for me while performing this install and configuration:
    * Andy C's Installing Hadoop 2 on a Mac blog post.
    * Freddy's Hadoop & Hbase on OSX 10.8 Mountain Lion blog post.

Get Software

Download the Hadoop 2.2.0 tarball from a mirror near you.
Download the HBase 0.96.0 tarball from a mirror near you.
Download the Pig 0.12.0 tarball from a mirror near you.

Install Software

Extract each of the above mentioned tarballs in a directory of your choice.  I exploded them in /Users/mkeating/Dev.

Configure Environemnt

I prefer to use the .bash_profile file to maintain my settings between terminal sessions.  I added the following to my .bash_profile.

export JAVA_HOME="$(/usr/libexec/java_home -v 1.6)"
export HADOOP_INSTALL="/Users/mkeating/Dev/hadoop-2.2.0"
export HBASE_HOME="/Users/mkeating/Dev/hbase-0.96.0-hadoop2"
export PIG_HOME="/Users/mkeating/Dev/pig-0.12.0"

export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin:
$HBASE_HOME/bin:$PIG_HOME/bin:$SUBLIME_TEXT_BIN

Create HDFS and MapReduce Directories

Create directories on your local file system to store HDFS and MapReduce data.  I created the following two directories:
    * /Users/mkeating/Dev/fs_root/hadoop-hdfs
    * /Users/mkeating/Dev/fs_root/hadoop-mapreduce

Configure Hadoop

Add the following to core-site.xml:

<configuration>  
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

Add the following to the bottom of hadoop-env.sh:

export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

Note: I'm no longer sure this is necessary or has any effect on the installation.  At one point, I believed the addition of the above suppressed "Unable to load realm info from SCDynamicStore" errors.

Add the following to hdfs-site.xml:

 <configuration>  
 <property>  
   <name>dfs.replication</name>  
   <value>1</value>  
  </property>  
  <property>  
   <name>dfs.namenode.name.dir</name> 
   <value>file:/Users/mkeating/Dev/fs_root/hadoop-hdfs/namenode</value>  
  </property>  
  <property>  
   <name>dfs.datanode.data.dir</name>  
   <value>file:/Users/mkeating/Dev/fs_root/hadoop-hdfs/datanode</value>  
  </property> 
 </configuration>

Add the following to the bottom of yarn-env.sh:

YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

Note: I'm no longer sure this is necessary or has any effect on the installation.  At one point, I believed the addition of the above suppressed "Unable to load realm info from SCDynamicStore" errors.

Add the following to yarn-site.xml:

 <configuration>  
  <property>  
   <name>yarn.resourcemanager.address</name>  
   <value>localhost:8032</value>  
  </property>  
  <property>  
   <name>yarn.nodemanager-aux-services</name>  
   <value>madpreduce.shuffle</value>  
  </property>  
 </configuration>


Start Hadoop

First format the namenode by executing the following command at the command prompt:

    hadoop namenode -format

Next, start HDFS and Yarn by executing the following commands at the command prompt:

    start-dfs.sh
    start-yarn.sh

Test Hadoop

Test your Hadoop installation/configuration by executing the following at the command prompt:

    jps

You should see output that looks something like this:

5053 Jps
2598 SecondaryNameNode
2416 NameNode
4933 Main
4915 HMaster
2704 ResourceManager
2498 DataNode

2789 NodeManager

Assuming no errors, you can run two more simple tests to check the installation.  First, cd to the $HADOOP_INSTALL directory.  Then copy a file to HDFS by executing the following commands.

hadoop fs -mkdir /user
hadoop fs -mkdir /user/<username> (where username is your logon ID)
hadoop fs -put LICENSE.txt
hadoop fs -ls

Finally try to run a MapReduce job by executing the following:

cd share/hadoop/mapreduce
hadoop jar ./hadoop-mapreduce-examples-2.2.0.jar wordcount LICENSE.txt out


Configure HBase

Add the following to hbase-env.sh:

export HBASE_OPTS="-Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

Add the following to hbase-site.xml:

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
  </property>
</configuration>


Update the Hadoop JARs that come with HBase

HBase 0.96.0 bundles Hadoop 2.1.0-beta JAR files.  To prevent errors, replace the 2.1.0-beta JARs in hbase-0.96.0-hadoop2/lib with the 2.2.0 JARs under the hadoop-2.2.0/share/hadoop directory.

Note: I could not find a replace for hadoop-client-2.1.0-beta.jar so I left it in the hbase-0.96.0-hadoop2/lib directory.  So far, I have not run into any problems.


Start HBase

Start HBase by executing the following command at the command prompt:

    start-hbase.sh


Test HBase

Test your HBase installation/configuration by launching the HBase shell and creating a table.  First, type the following command at the command prompt to open the HBase shell.

    hbase shell

Next, create a new table and insert a value by executing the following commands within the HBase shell:

create 'my_table', 'col_fam'
put 'my_table', 'row1', 'col_fam:a', 'value1'
put 'my_table', 'row1', 'col_fam:b', 'value2'
put 'my_table', 'row2', 'col_fam:a', 'value3'
scan 'my_table'
get 'my_table', 'row1'
disable 'my_table'
drop 'my_table'

2 comments:

  1. You put really very helpful information. Keep it up.

    Big Data Training in Chennai

    ReplyDelete
  2. Extremely decent review. I totally appreciate this site. Much obliged!
    best interiors

    ReplyDelete