Monday, November 18, 2013

Starting and Stopping

Automatically Starting and Stopping Hadoop and HBase

After installing Hadoop and HBase on OS X, I noticed that data in HBase would get confused after waking up from sleep.  Specifically, I was not be able to scan data in the an HBase table.  I could successfully perform a list operation within the HBase shell (i.e., a list of all my tables would appear).  However, if I tried an scan operation on any of the tables, I would see a host of stack traces at the console.  I used the steps below to automate starting and stoping Hadoop and HBase to:
  1. Save myself time running the start-xxx.sh commands
  2. Prevent HBase from freaking out after sleep
I would like to preface this post with two important points:
  1. I do not claim that the steps below fix the root cause of the HBase problem referenced above. However, the steps addressed the symptoms sufficiently to and allow my environment to retain data after waking up from sleep.
  2. I am barely proficient in AppleScript.  The scripts below are hacked together with the goal of addressing the problem mentioned above.  The scripts are not very efficient or elegant.  If you'd like to improve them, please post a comment!

The Third Party Software

To help me in quest to (a) automate starting and stopping my Hadoop and HBase services, I used a utility from Lagente Software called Scenario.  Scenario is pay-for-use software.  I am told that one can use a combination of built in OS X functionality and the free utility Sleepwatcher to implement the same effect.  I could not, however, get Sleepwatcher to work so, after monkeying with Sleepwatcher for over an hour, I decided the $5 for Scenario was money well spent.

The Scripts

After installing Scenario, I authored four AppleScripts:
  1. One for staring DFS, Yarn, and HBase immediate after logging in
  2. One for stopping HBase, Yarn, and DFS just prior to logging out
  3. One for stopping HBase prior to sleeping
  4. One for starting HBase immediate after waking up from sleep

The Configuration

I "configured" each of the scripts to run at the appropriate time by placing them in folders as directed by the Scenario Programming Guide

No comments:

Post a Comment