Sunday, October 17, 2010

Using HBaseStorage in Pig

Pig is a analysis tool based on hadoop. And it can accept any data source as long as you have the right InputFormat for the data source. While HBase is a big-table like data storage system built based on hadoop. Pig-1205 has the details for the implementation of HBaseStorage. Here I'd like to talk about how to use it .
1. Register three jars: hbase-0.20.6.jar, zookeeper.jar, guava.jar
2. Tell the TaskTracker the configuration files of hbase storage, one convenient method method is copy files under $HBASE_HOME/conf to $HADOOP_HOMB/conf
3. Import package org.apache.pig.backend.hadoop.hbase, or use the fully-qualified class name : org.apache.pig.backend.hadoop.hbase.HBaseStorage

1 comments:

Corbin Hoenes said...

Thanks Jeff very helpful. I found this article which references another way using HADOOP_CLASSPATH to get the hbase jars and configs into the class path on your nodes so that pig/map reduce jobs can talk to hbase.

http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description