(1)hadoop distcp
只能实现集群之间文件的拷贝,hive表字段抽取拷贝无法实现
官网中对distcp命令的描述:
https://hadoop.apache.org/docs/stable/hadoop-distcp/DistCp.html
(2)datax
使用datax将hdfs://192.168.73.128:8020这个集群的hive表test,同步到hdfs://hadoop01:8020这个集群的hive表hdfs2hdfs
注意:在同步之前要在目标集群的hive中先建好hdfs2hdfs表
{ "job": { "content": [ { "reader": { "name": "hdfsreader", "parameter": { "column": ["*"], "defaultFS": "hdfs://192.168.73.128:8020", "encoding": "UTF-8", "fieldDelimiter": "|", "fileType": "text", "path": "/user/hive/warehouse/test" } }, "writer": { "name": "hdfswriter", "parameter": { "column": [ { "name":"word", "type":"STRING" }, { "name":"cnt", "type":"INT" } ], "defaultFS": "hdfs://hadoop01:8020", "encoding": "UTF-8", "fieldDelimiter": "|", "fileType": "text", "path": "/user/hive/warehouse/hdfs2hdfs", "fileName": "hdfs2hdfs", "writeMode": "append", "compress": "GZIP" } } } ], "setting": { "speed": { "channel": "1" } } } }
(3)sqoop
使用sqoop2可以配置hdfs connector,实现hdfs到hdfs的同步
创建job的时候可以配置要同步的表和列
如下所示
Creating job for links with from id 1 and to id 6 Please fill following values to create new job object Name: mysql_openfire--设置 任务名称 FromJob configuration Schema name:(Required)sqoop --库名:必填 Table name:(Required)sqoop --表名:必填 Table SQL statement:(Optional) --选填 Table column names:(Optional) --选填 Partition column name:(Optional) id --选填 Null value allowed for the partition column:(Optional) --选填 Boundary query:(Optional) --选填 ToJob configuration Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Choose: 0 --选择文件压缩格式 Compression format: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOM Choose: 0 --选择压缩类型 Custom compression format:(Optional) --选填 Output directory:hdfs:/ns1/sqoop --HDFS存储目录(目的地) Driver Config Extractors: 2 --提取器 Loaders: 2 --加载器 New job was successfully created with validation status OK and persistent id 1
具体步骤参考:
https://www.iteye.com/blog/muruiheng-2269162
还有一个要考虑的问题是kerberos认证
(1)datax:可以配置kerberos认证信息,可以参考datax官网hdfsreader的描述
https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md
(2)sqoop