您现在的位置: 首页 > 网站导航收录 > 百科知识百科知识
如何使用Hadoop的Archive处理小文件?
文件,命令,源文件如何使用Hadoop的Archive处理小文件?
发布时间:2016-12-08加入收藏来源:互联网点击:
如何使用Hadoop的Archive处理小文件?
回答于 2019-09-11 08:43:50
回答于 2019-09-11 08:43:50
这个处理方法挺多的,暂且举个例子来简单说明一下:
使用hadoop archive 命令通过mapreduce任务 生产 har 压缩文件
测试hdfs源文件:
/test/lizhao/2019-01-13/*
/test/lizhao/2019-01-14/*
压缩命令 hadoop archive -archiveName NAME -p <parent path> [-r <replication factor>]<src>* <dest>:
>>> hadoop archive -archiveName 2019-01.har -p /test/lizhao 2019-01-13 2019-01-14 /test/lizhao/
19/01/14 14:11:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/01/14 14:11:55 INFO client.RMProxy: Connecting to ResourceManager at IC-1/192.168.11.180:8032
19/01/14 14:11:56 INFO client.RMProxy: Connecting to ResourceManager at IC-1/192.168.11.180:8032
19/01/14 14:11:56 INFO client.RMProxy: Connecting to ResourceManager at IC-1/192.168.11.180:8032
19/01/14 14:11:56 INFO mapreduce.JobSubmitter: number of splits:1
19/01/14 14:11:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533867597475_0001
19/01/14 14:11:58 INFO impl.YarnClientImpl: Submitted application application_1533867597475_0001
19/01/14 14:11:58 INFO mapreduce.Job: The url to track the job: http://ic-1:8088/proxy/application_1533867597475_0001/
19/01/14 14:11:58 INFO mapreduce.Job: Running job: job_1533867597475_0001
19/01/14 14:12:07 INFO mapreduce.Job: Job job_1533867597475_0001 running in uber mode : false
19/01/14 14:12:07 INFO mapreduce.Job: map 0% reduce 0%
19/01/14 14:12:13 INFO mapreduce.Job: map 100% reduce 0%
19/01/14 14:12:24 INFO mapreduce.Job: map 100% reduce 100%
19/01/14 14:12:24 INFO mapreduce.Job: Job job_1533867597475_0001 completed successfully
19/01/14 14:12:24 INFO mapreduce.Job: Counters: 49
*****
Map-Reduce Framework
Map input records=15
Map output records=15
Map output bytes=1205
Map output materialized bytes=1241
Input split bytes=116
Combine input records=0
Combine output records=0
Reduce input groups=15
Reduce shuffle bytes=1241
Reduce input records=15
Reduce output records=0
Spilled Records=30
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=137
CPU time spent (ms)=6370
Physical memory (bytes) snapshot=457756672
Virtual memory (bytes) snapshot=3200942080
Total committed heap usage (bytes)=398458880
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=995
File Output Format Counters
Bytes Written=0
3、查看压缩后的文件:
>>> hadoop fs -ls har:///test/lizhao/2019-01.har
drwxr-xr-x - root supergroup 0 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-13
drwxr-xr-x - root supergroup 0 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-14
>>> hadoop fs -ls har:///test/lizhao/2019-01.har/2019-01-13
-rw-r--r-- 2 root supergroup 22 2019-01-14 14:05 har:///test/lizhao/2019-01.har/2019-01-13/1.txt
-rw-r--r-- 2 root supergroup 22 2019-01-14 14:05 har:///test/lizhao/2019-01.har/2019-01-13/2.txt
-rw-r--r-- 2 root supergroup 22 2019-01-14 14:05 har:///test/lizhao/2019-01.har/2019-01-13/3.txt
-rw-r--r-- 2 root supergroup 22 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-13/5.txt
-rw-r--r-- 2 root supergroup 22 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-13/6.txt
-rw-r--r-- 2 root supergroup 22 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-13/7.txt
4、下载har 中的文件
hadoop fs -get har:///test/lizhao/2019
下一篇:返回列表
相关链接 |
||
网友回复(共有 0 条回复) |