写在前面:
需要保证hadoop版本 各个jar版本一致,否则可能出现各种哦莫名奇妙的错误!
maven 依赖:
4.0.0 jar BaseTecLearn BaseTecLearn 1.0-SNAPSHOT org.apache.spark spark-core_2.11 2.2.0 org.apache.spark spark-sql_2.11 2.2.0 org.apache.thrift libthrift 0.6.1 org.apache.hadoop hadoop-common 2.7.1 org.apache.hadoop hadoop-mapreduce-client-core 2.7.4
View Code
resource目录下配置日志(很重要,可以查看警告啥的)
log4j.rootLogger=WARN,stdout,logfile log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=hadoop.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%ns
package top.letsgogo;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount { public static class TokenizerMapper extends Mapper