Hbase tips and tricks

  1. irbrc file-irbrc configuration to save all command history of all hbase shell invocations.

minimal configuration of irbrc-

more ~/.irbrc require 'irb/ext/save-history' IRB.conf[:SAVE\_HISTORY] = 100 IRB.conf[:HISTORY\_FILE] = "#{ENV['HOME']}/.irb\_history"\_exit do IRB.conf[:AT\_EXIT].each do |i| end end
  1. enable debugging level
hbase\>debug or ./bin/hbase shell -d
  1. counters with hbase- hbase offers counter feature, counters are very useful in statistics
hbase(main):001:0\> create 'account', 'id' 0 row(s) in 1.1930 seconds hbase(main):002:0\> incr 'account', '2014', 'id:n', 1 COUNTER VALUE = 1 hbase(main):04:0\> get\_counter 'account', '2014', 'id:n' COUNTER VALUE = 2
  1. scan query optimization

Scan is used to get the data from hbase and the costliest operation.
An optional startRow and stopRow is useful to improve the query performance.If rows are not defined(start and stop), the Scanner will iterate over all rows.
Hbase scan queries with start and end key are much faster because, it doesn’t have to scan everything to get the specified query/filter data.
Here is tricks-

  • create hbase table and populate data-
create 'TS','cf'
row id cf:desc        
card_number_year_month_day_time_o transaction_amt location type year month
100_2014_06_10_10_932845_ta 100 bangalore credit 2014 6
23989_2000_01_11_10_5468756_ta 45843745 bangalore india debit 2000 5

2000 1
  • Avoid Full Table Scan-

find out all transaction done by card number x at place bangalore.
use prefix/rowkey filter with regex/substring comparator to set the search condition and set the start row as ‘X’ and stop row ‘X~’.
Row keys are sorted(lexical) and data is stored in byte in hbase. The start/stop key helps to avoid the complete table scan and fetch the data from region contains the range value, as(~) is last in ascii table so hbase scan lookup the rows having prefix X~.
Retrieving data from HBase scan with filter-

Scan scan = new Scan(Bytes.ToBytes("23989"),Bytes.toBytes("23989~"); scan.setFilter(...);
  • Disable cache at client-

and setCaching(0)

  • Get all the row having account number 23989
import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.RowFilter import org.apache.hadoop.hbase.filter.SubstringComparator scan 'TS', {STARTROW=\>'23989', STOPROW=\>'23989~',FILTER=\>'EQUAL'),'23989'))}

Use start and stop row to optimize scan query.

  • Count all row
count 'TS', INTERVAL =\> 10000, CACHE =\> 1000

decrease CACHE value if row is very large.