HBase tutorial for beginners December 22nd, 2008
First of all, HBase is a column oriented database. However, you have to forget everything you have learned about tables, columns and rows in the RDBMS world. The data in an HBase instance is layed out more like a hashtable, and the data is immutable. Whenever you update the data, you are actually just creating a new version of it.
This tutorial will be very hands-on, with not too much explanation. There are a number of articles where the column oriented databases are described in details. Check out my delicious tag for some good ones, for instance jimbojw.com’s excellent introduction
I used Apple OSX 10.5.6 in this tutorial, I am not sure if this will work on windows and linux.
The goal for this tutorial is to create a model for a blog with integration from a java program.
Get started
- Download from the latest stable release from apache. I went with the hbase-0.18.1 release.
- Unpack it, for instance to ~/hbase
- Edit ~/hbase/conf/hbase-env.sh and set the correct JAVA_HOME variable.
- Start hbase by running ~/hbase/bin/start-hbase.sh
Create a table
- Start the hbase shell by running ~/hbase/bin/hbase shell
- Run create ‘blogposts’, ‘post’, ‘image’ in the shell
Now you have a table called blogposts, with a post, and a image family. These families are “static” like the columns in the RDBMS world.
Add some data to the table
Run the following commands in the shell:
- put ‘blogposts’, ‘post1′, ‘post:title’, ‘Hello World’
- put ‘blogposts’, ‘post1′, ‘post:author’, ‘The Author’
- put ‘blogposts’, ‘post1′, ‘post:body’, ‘This is a blog post’
- put ‘blogposts’, ‘post1′, ‘image:header’, ‘image1.jpg’
- put ‘blogposts’, ‘post1′, ‘image:bodyimage’, ‘image2.jpg’
Look at the data
Run get ‘blogposts’, ‘post1′ in the shell. This should output something like this.
| COLUMN | CELL |
|---|---|
| image:bodyimage | timestamp=1229953133260, value=image2.jpg |
| image:header | timestamp=1229953110419, value=image1.jpg |
| post:author | timestamp=1229953071910, value=The Author |
| post:body | timestamp=1229953072029, value=This is a blog post |
| post:title | timestamp=1229953071791, value=Hello World |
Summary part1
So, what have we accomplished so far? We have created a table and added one ‘record’ to it. This record consists of the blogpost itself, and the images attached to it. So, how do we retrieve those data from a java application?
Integrate with HBase from Java
In order to integrate with HBase you will need the following jar files in your classpath:
- commons-logging-1.0.4.jar
- hadoop-0.18.1-core.jar
- hbase-0.18.1.jar
- log4j-1.2.13.jar
All these are found within ~/hbase/lib and ~/hbase
Ok. Here’s the java code:
[sourcecode language='java']
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.io.RowResult;
import java.util.HashMap;
import java.util.Map;
import java.io.IOException;
public class HBaseConnector {
public static Map retrievePost(String postId) throws IOException {
HTable table = new HTable(new HBaseConfiguration(), “blogposts”);
Map post = new HashMap();
RowResult result = table.getRow(postId);
for (byte[] column : result.keySet()) {
post.put(new String(column), new String(result.get(column).getValue()));
}
return post;
}
public static void main(String[] args) throws IOException {
Map blogpost = HBaseConnector.retrievePost(“post1″);
System.out.println(blogpost.get(“post:title”));
System.out.println(blogpost.get(“post:author”));
}
}
[/sourcecode]
This code should print out ‘Hello World’ and ‘The Author’.
Happy coding, check out HBase’s javadoc for more examples.
Please leave feedback in the comments.
This entry was posted on Monday, December 22nd, 2008 at 15:42 and is filed under Programming. You can follow any responses to this entry through the RSS 2.0 feed.You can leave a response, or trackback from your own site.
10 Responses
AlexAxe Says:
Hello,
Great job. But not enought info. Where can i read more?
Thank you
AlexAxe
RoninJoe Says:
Nice post. There are some more examples on the HBase wiki, but this is a nice “hello world”.
Thanks,
Joe
Amandeep Khurana Says:
Good info to get someone started.
Thanks!
check_writer Says:
what else do i need to do to make sure my configuration is read?
Kevin Says:
Hello test:
I want to download hbase 0.19.1.jar. Where can I download it?
I want to use it on Eclipse for Java.
How many jars I need to prepare?
工作=磨練? » [yes]Hbase tutorial Says:
[...] Hbase Tutorial for beginners http://ole-martin.net/hbase-tutorial-for-beginners/ [...]
Baby Steps into HBase « write-only by Gregg Lind Says:
[...] Ole-Martin Mørk’s instructions was a breeze! Even though I know almost nothing about Java and the Java environment, I managed [...]
Christine Says:
Very nice,
it helped me for the start.
thank you
Ian Kallen Says:
Thanks for posting a real simple example. Since I’ve been curious about JRuby too, I ported the programmatic access example to ruby, see http://gist.github.com/246032
hferreira Says:
Hi,
I’m a newbie on this, just started today. How do I run the Java class? In which machine? In the java I don’t see any parameters like the IP or the port. There’s any special command to run java class or java jar files on HBase?
Thanks in Advace