本文是基于hbase 0.96.0 测试的,理论上支持hbase 0.94 以上版本!!
HBase有两种协处理器(Coprocessor)
1、RegionObserver
:类似于关系型数据库的触发器
2、Endpoint:类似于关系型数据库的存储过程,本文将介绍此种Coprocessor.
Endpoint 允许您定义自己的动态RPC协议,用于客户端与region servers通讯。Coprocessor 与region server在相同的进程空间中,因此您可以在region端定义自己的方法(endpoint),将计算放到region端,减少网络开销,常用于提升hbase的功能,如:count,sum等。
本文以count为例,实现一个自己的endpoint:
一、定义一个protocol buffer Service。
1、安装protobuf
下载protoc-2.5.0-win32.zip(根据自己的操作系统选择),解压;
将protoc-2.5.0-win32中的protoc.exe拷贝到c:\windows\system32中。
将proto.exe文件拷贝到解压后的XXX\protobuf-2.5.0\src目录中.
参考链接:http://shuofenglxy.iteye.com/blog/1512980
2.定义.proto文件,用于定义类的一些基本信息
CXKTest.proto的代码如下:
- <span style="font-family:SimSun;font-size:14px;">option java_package = "com.cxk.coprocessor.test.generated";
- option java_outer_classname = "CXKTestProtos";
- option java_generic_services = true;
- option java_generate_equals_and_hash = true;
- option optimize_for = SPEED;
- message CountRequest {
- }
- message CountResponse {
- required int64 count = 1 [default = 0];
- }
- service RowCountService {
- rpc getRowCount(CountRequest)
- returns (CountResponse);
- }</span>
参考链接:https://developers.google.com/protocol-buffers/docs/proto#services
执行命令:proto.exe--java_out=. CXKTest.proto
在 com.cxk.coprocessor.test.generated 下会生成类:CXKTestProtos
二、定义自己的Endpoint类(实现一下自己的方法)
RowCountEndpoint.java 的代码片段如下:
- <span style="font-family:SimSun;font-size:14px;">package com.cxk.coprocessor.test;
- import java.io.IOException;
- import java.util.ArrayList;
- import java.util.List;
- import org.apache.hadoop.hbase.Cell;
- import org.apache.hadoop.hbase.CellUtil;
- import org.apache.hadoop.hbase.Coprocessor;
- import org.apache.hadoop.hbase.CoprocessorEnvironment;
- import org.apache.hadoop.hbase.client.Scan;
- import org.apache.hadoop.hbase.coprocessor.CoprocessorException;
- import org.apache.hadoop.hbase.coprocessor.CoprocessorService;
- import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
- import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
- import org.apache.hadoop.hbase.protobuf.ResponseConverter;
- import org.apache.hadoop.hbase.regionserver.InternalScanner;
- import org.apache.hadoop.hbase.util.Bytes;
- import com.google.protobuf.RpcCallback;
- import com.google.protobuf.RpcController;
- import com.google.protobuf.Service;
- public class RowCountEndpoint extends CXKTestProtos.RowCountService
- implements Coprocessor, CoprocessorService {
- private RegionCoprocessorEnvironment env;
- public RowCountEndpoint() {
- }
- @Override
- public Service getService() {
- return this;
- }
- /**
- * 统计hbase表总行数
- */
- @Override
- public void getRowCount(RpcController controller, CXKTestProtos.CountRequest request,
- RpcCallback<CXKTestProtos.CountResponse> done) {
- Scan scan = new Scan();
- scan.setFilter(new FirstKeyOnlyFilter());
- CXKTestProtos.CountResponse response = null;
- InternalScanner scanner = null;
- try {
- scanner = env.getRegion().getScanner(scan);
- List<Cell> results = new ArrayList<Cell>();
- boolean hasMore = false;
- byte[] lastRow = null;
- long count = 0;
- do {
- hasMore = scanner.next(results);
- for (Cell kv : results) {
- byte[] currentRow = CellUtil.cloneRow(kv);
- if (lastRow == null || !Bytes.equals(lastRow, currentRow)) {
- lastRow = currentRow;
- count++;
- }
- }
- results.clear();
- } while (hasMore);
- response = CXKTestProtos.CountResponse.newBuilder()
- .setCount(count).build();
- } catch (IOException ioe) {
- ResponseConverter.setControllerException(controller, ioe);
- } finally {
- if (scanner != null) {
- try {
- scanner.close();
- } catch (IOException ignored) {}
- }
- }
- done.run(response);
- }
- @Override
- public void start(CoprocessorEnvironment env) throws IOException {
- if (env instanceof RegionCoprocessorEnvironment) {
- this.env = (RegionCoprocessorEnvironment)env;
- } else {
- throw new CoprocessorException("Must be loaded on a table region!");
- }
- }
- @Override
- public void stop(CoprocessorEnvironment env) throws IOException {
- // nothing to do
- }
- }
- </span>
三、实现自己的客户端方法:
TestEndPoint.java 代码如下:
- <span style="font-family:SimSun;font-size:14px;">package com.test;
- import java.io.IOException;
- import java.util.Map;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.hbase.HBaseConfiguration;
- import org.apache.hadoop.hbase.client.HTable;
- import org.apache.hadoop.hbase.client.coprocessor.Batch;
- import org.apache.hadoop.hbase.ipc.BlockingRpcCallback;
- import org.apache.hadoop.hbase.ipc.ServerRpcController;
- import com.cxk.coprocessor.test.CXKTestProtos;
- import com.cxk.coprocessor.test.CXKTestProtos.RowCountService;
- import com.google.protobuf.ServiceException;
- public class TestEndPoint {
- /**
- *
- * @param args[0] ip ,args[1] zk_ip,args[2] table_name
- * @throws ServiceException
- * @throws Throwable
- */
- public static void main(String[] args) throws ServiceException, Throwable {
- // TODO Auto-generated method stub
- System.out.println("begin.....");
- long begin_time=System.currentTimeMillis();
- Configuration config=HBaseConfiguration.create();
- // String master_ip="192.168.150.128";
- String master_ip=args[0];
- String zk_ip=args[1];
- String table_name=args[2];
- config.set("hbase.zookeeper.property.clientPort", "2181");
- config.set("hbase.zookeeper.quorum", zk_ip);
- config.set("hbase.master", master_ip+":600000");
- final CXKTestProtos.CountRequest request = CXKTestProtos.CountRequest.getDefaultInstance();
- HTable table=new HTable(config,table_name);
- Map<byte[],Long> results = table.coprocessorService(RowCountService.class,
- null, null,
- new Batch.Call<CXKTestProtos.RowCountService,Long>() {
- public Long call(CXKTestProtos.RowCountService counter) throws IOException {
- ServerRpcController controller = new ServerRpcController();
- BlockingRpcCallback<CXKTestProtos.CountResponse> rpcCallback =
- new BlockingRpcCallback<CXKTestProtos.CountResponse>();
- counter.getRowCount(controller, request, rpcCallback);
- CXKTestProtos.CountResponse response = rpcCallback.get();
- if (controller.failedOnException()) {
- throw controller.getFailedOn();
- }
- return (response != null && response.hasCount()) ? response.getCount() : 0;
- }
- });
- table.close();
- if(results.size()>0){
- System.out.println(results.values());
- }else{
- System.out.println("没有任何返回结果");
- }
- long end_time=System.currentTimeMillis();
- System.out.println("end:"+(end_time-begin_time));
- }
- }
- </span>
四、部署endpoint
部署endpoint有两种方法,第一种通过修改hbase.site.xml文件,实现对所有表加载这个endpoint;第二张通过alter表,实现对某一张表加载这个endpoint;
1、修改hbase.site.xml
在hbase.site.xml中添加如下内容
- <span style="font-family:SimSun;font-size:14px;"><property>
- <name>hbase.coprocessor.region.classes</name>
- <value>com.cxk.coprocessor.test.RowCountEndpoint</value>
- <description>A comma-separated list of Coprocessors that are loaded by
- default. For any override coprocessor method from RegionObservor or
- Coprocessor, these classes' implementation will be called
- in order. After implement your own
- Coprocessor, just put it in HBase's classpath and add the fully
- qualified class name here.
- </description>
- </property></span>
2、hbase shell alter表
A、将CXKTestProtos.java和RowCountEndpoint.java打成jar放到hdfs上;
B、
- <span style="font-family:SimSun;font-size:14px;">disable 'test'</span>
C、
- <span style="font-family:SimSun;font-size:14px;">alter 'test','coprocessor'=>'hdfs:///user/hadoop/test/coprocessor/cxkcoprocessor.1.01.jar|com.cxk.coprocessor.test.RowCountEndpoint|1001|arg1=1,arg2=2'</span>
- <span style="font-family:SimSun;font-size:14px;">enable 'test'</span>
五、运行客户端
将TestEndPoint.java 打成jar,通过以下命令运行
- <span style="font-family:SimSun;font-size:14px;">java -jar test.cxk.endpiont.1.03.jar ip1 ip2 test</span>
ps:如果eclipse可以直接调试hadoop,可直接运行测试类。
=================================================================================
===============================================================================
参考材料:
http://hbase.apache.org/devapidocs/index.html
相关推荐
hadoop2.2+hbase0.96+hive0.12安装整合详细高可靠文档及经验总结
Hadoop2.2.0+HBase.96+Hive0.12简单集群环境的搭建
用于生产环境的hadoop2.2.0和hbase0.96.2、hive0.12的集成安装 经过测试环境
CentOS-6.4 64位系统下hadoop-2.2.0+hbase-0.96+zookeeper-3.4.5 分布式安装配置
NULL 博文链接:https://cctype.iteye.com/blog/2035320
Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境搭建
hadoop2.6及hbase0.96伪分布式安装配置文件
基于Flume+Kafka+Hbase+Flink+FineBI的实时综合案例.txt基于Flume+Kafka+Hbase+Flink+FineBI的实时综合案例.txt基于Flume+Kafka+Hbase+Flink+FineBI的实时综合案例.txt基于Flume+Kafka+Hbase+Flink+FineBI的实时综合...
hbase 的java代码 集合 hbase 0.96
1、内容概要:Hadoop+Spark+Hive+HBase+Oozie+Kafka+Flume+Flink+Elasticsearch+Redash等大数据集群及组件搭建指南(详细搭建步骤+实践过程问题总结)。 2、适合人群:大数据运维、大数据相关技术及组件初学者。 3、...
VMware10+CentOS6.5+Hadoop2.2+Zookeeper3.4.6+HBase0.96安装过程详解 用于解决分布式集群服务器
VMware10+CentOS6.5+Hadoop2.2+Zookeeper3.4.6+HBase0.96安装过程详解.pdf
资源名称:hadoop2完全分布式及整合hbase0.96安装文档 内容简介: 首先说一下这个安装过程需要注意的地方一、使用新建用户可能会遇到的问题(1)权限问题:对于新手经常使用root,刚开始可以使用,...
Hadoop集群搭建必备安装包,包括zookeeper3.4.12+hbase1.4.4+sqoop1.4.7bin_hadoop-2.6.0+kafka2.10亲测可用。
hadoop2.7.1+hbase2.1.4+zookeeper3.6.2集合
因为配置大数据的基础环境特别费事,因此这里搭建好了一份基础环境
hbase2.3.5+spark-3.0.3源码编译包
全套的Hadoop+Hbase+Spark+Hive搭建指导手册
本文档由王家林老师详细的介绍了 hadoop2.2完全分布式 及整合hbase0.96 安装步骤,图文并茂。
hadoop2.2 hbase0.96 hive0.12安装整合详细高可靠文档及经验总结