2009年10月31日土曜日

Hadoop Plugin for Eclipse 3.5.1 Galileo

Recently I started trying Apache Hadoop and there were some people wrote about Eclipse Plugin.I checked out latest source code but it seems they didn't include the plugin no longer, so I checked out version 0.19.2 that was the newest one I found that had the plugin.

I tried with hadoop 0.19.2 however the plugin didn't work well on Eclipse 3.5.1 Galileo.
The error log said they couldn't find a class and couldn't instanciate the plugin when they try to run Hadoop program. I looked through the source and found that was using "internal" class. It seemed the class was no longer in the latest Eclipse. I changed "import" sentences, fixed one bug about null checking and finally got the plugin working.

===========================
The system specification was :

Host OS : Mac OSX 10.5
Guest OS on VMware : Ubuntu 9.04

Eclipse 3.5.1 Galileo
Hadoop 0.19.2
===========================

I uploaded the fixed plugin here. In case, if anyone is trying to use the plugin, the following that I write about installation and usage will help.


First, you need to install Hadoop. I prefere downloading not the latest but version 0.19.2.
Quick Start will help greatly. Latest one( currently 0.20.1 ) is using separated files for configuration however you can just find same entries in conf/hadoop-site.xml (you might need to copy from hadoop-default.xml first time ) . Please run hadoop and create directory input on Hadoop File System (HDFS) before you go to next step. You may also want't to copy some text files in it as it is done in Quick Start.

Second, download the plugin and copy it in "plugins" directory then start Eclipse. You will see "Map/Reduce Project" when you try to create "New Project". However, you can't create its project until you configure the plugin. So, open "Preference -> Hadop Map/Reduce" and set Hadoop installed directory(fig 1). Finally we can start creating and run Hadoop project.

fig 1 : Setup Hadoop installed directory
  1. Create new Hadoop Project then Eclipse will lead you to Hadoop Plugin Perspective. Now, you can see blue elephant icons.
  2. Open "Window->Show View->Other->MapReduce Tools->Map/Reduce Locations"(fig 2).
  3. Click "Blue Elephant" in "Map/Reduce Locations" view and define Hadoop location. For testing you need to edit "Location Name", "Host" and "Port" in General tab, also you need to set "eclipse.plugin.jobtracker.host", "eclipse.plugin.jobtracker.port", "fs.default.name" (fig 3, 4).
  4. Check if you can connect to HDFS from Eclipse. In "Project Explorer", open DFS Locations.
  5. Create Hadoop programs (you can use this sample) and try running them. You can select a class that has main and "Right Click -> Run As ->Run on Hadoop" (fig 5).
  6. Choose Location and Finish then you will see Hadoop outputs in Console.
fig 2 : Map/Reduce Location View and DFS Explorer

fig 3, 4 : Hadoop Location setup


fig 5 : Running Hadoop Program