1)会有什么影响
(1)1个文件块,占用namenode多大内存150字节
1亿个小文件*150字节
1 个文件块 * 150字节
128G能存储多少文件块? 128 * 1024*1024*1024byte/150字节 = 9亿文件块
2)怎么解决
(1)采用har归档方式,将小文件归档
hadoop archive -archiveName 20200701.har /user/hadoop/login/202007/01(源文件路径) /user/hadoop/login/202007/01(目标文件路径)
CREATE EXTERNAL TABLE login_har(
ldate string,
ltime string,
userid int,
name string)
PARTITIONED BY (
ym string,
d string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://h60:9000/user/hadoop/login'
先对其父目录建表,然后对年月日进行分区(PARTITIONED BY)
再手动修改其动态分区 即可:
alter table login_har add partition(ym='202007',d='01') LOCATION 'har:///flume/loginlog/202007/01/20200701.har';








暂无数据