Hbase table sizes on hdfs are X 4 of actual input file -


i'm new forum , hdfs/hbase.

i have created table in hbase on hdfs. file loaded had 10million record's size of 1gb on windows disk. when file loaded on hdfs, size of table in hdfs is:-

root@narmada:~/agni/hdfs/hadoop-1.1.2# ./bin/hadoop fs -dus /hbase/hdfs_10m hdfs://192.168.5.58:54310/hbase/hdfs_10m       4143809619 

can 1 plz reduce size?

table details.

description                                                                                                  enabled  'hdfs_10m', {name => 'v', data_block_encoding => 'none', bloomfilter => 'none', replication_scope => '0',  true  versions => '3', compression => 'none', min_versions => '0', ttl => '2147483647', keep_deleted_cells => 'fa  lse', blocksize => '65536', in_memory => 'false', encode_on_disk => 'true', blockcache => 'true'} 1 row(s) in 0.2340 seconds 

generally once when load file on top of hdfs divides file blocks of equal size. default block size 64mb. hadoop maintains 3 duplicates of each block, means if want store file of size 1tb on hdfs, need hardware store 3tb. each block stored on 3 different data nodes.

ref:http://hadooptutor.blogspot.com/2013/07/replication.html

if don't need replication of data, place following property in hbase , hadoop config files.

<property>   <name>dfs.replication</name>   <value>1</value> </property> 

Comments

Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

Function that returns a formatted array in VBA -