This installation and configuration are under winXP OS
prapare three software package
1、cygwin(http://cygwin.com/setup.exe)
2、hadoop (http://mirror.bjtu.edu.cn/apache/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz)
3、jdk ( above version 6)
cygwin installed under D:\ directory
2、hadoop (http://mirror.bjtu.edu.cn/apache/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz)
3、jdk ( above version 6)
cygwin installed under D:\ directory
Exctract hadoop unser D:\cygwin
install jdk under C:\
and then do configaration , and below commad in.bashrc
export JAVA_HOME==/cygdrive/c/Java/jdk1.7.0_03
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
addtionally , under hadoop/conf we need modify conf/hadoop-env.sh
configure JAVA_HOME
export JAVA_HOME=/cygdrive/c/Java/jdk1.7.0_03
configuration is done
install jdk under C:\
and then do configaration , and below commad in.bashrc
export JAVA_HOME==/cygdrive/c/Java/jdk1.7.0_03
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
addtionally , under hadoop/conf we need modify conf/hadoop-env.sh
configure JAVA_HOME
export JAVA_HOME=/cygdrive/c/Java/jdk1.7.0_03
configuration is done
$ bin/hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility jobtracker run the MapReduce job Tracker node pipes run a Pipes job tasktracker run a MapReduce task Tracker node job manipulate MapReduce jobs queue get information regarding JobQueues version print the version jar <jar> run a jar file distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME <src>* <dest> create a hadoop archive daemonlog get/set the log level for each daemon or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters. |
next we may run a program wordcount
1、create an input folder(program will automatically create output)
2、put some test file into input forlder 3、$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount input output 12/03/05 04:05:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 12/03/05 04:05:43 INFO input.FileInputFormat: Total input paths to process : 1 12/03/05 04:05:44 INFO mapred.JobClient: Running job: job_local_0001 12/03/05 04:05:44 INFO input.FileInputFormat: Total input paths to process : 1 12/03/05 04:05:44 INFO mapred.MapTask: io.sort.mb = 100 12/03/05 04:05:44 INFO mapred.MapTask: data buffer = 79691776/99614720 12/03/05 04:05:44 INFO mapred.MapTask: record buffer = 262144/327680 12/03/05 04:05:44 INFO mapred.MapTask: Starting flush of map output 12/03/05 04:05:44 WARN mapred.LocalJobRunner: job_local_0001 java.io.IOException: Expecting a line not the end of stream at org.apache.hadoop.fs.DF.parseExecResult(DF.java:109) at org.apache.hadoop.util.Shell.runCommand(Shell.java:179) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 12/03/05 04:05:45 INFO mapred.JobClient: map 0% reduce 0% 12/03/05 04:05:45 INFO mapred.JobClient: Job complete: job_local_0001 12/03/05 04:05:45 INFO mapred.JobClient: Counters: 0 above problem can be solved by configuring LANG export LANG=en.utf8 $ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount input output 12/03/05 04:07:18 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 12/03/05 04:07:18 INFO input.FileInputFormat: Total input paths to process : 1 12/03/05 04:07:19 INFO mapred.JobClient: Running job: job_local_0001 12/03/05 04:07:19 INFO input.FileInputFormat: Total input paths to process : 1 12/03/05 04:07:19 INFO mapred.MapTask: io.sort.mb = 100 12/03/05 04:07:19 INFO mapred.MapTask: data buffer = 79691776/99614720 12/03/05 04:07:19 INFO mapred.MapTask: record buffer = 262144/327680 12/03/05 04:07:19 INFO mapred.MapTask: Starting flush of map output 12/03/05 04:07:19 INFO mapred.MapTask: Finished spill 0 12/03/05 04:07:19 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 12/03/05 04:07:19 INFO mapred.LocalJobRunner: 12/03/05 04:07:19 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done. 12/03/05 04:07:19 INFO mapred.LocalJobRunner: 12/03/05 04:07:19 INFO mapred.Merger: Merging 1 sorted segments 12/03/05 04:07:19 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 5204 bytes 12/03/05 04:07:19 INFO mapred.LocalJobRunner: 12/03/05 04:07:19 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 12/03/05 04:07:19 INFO mapred.LocalJobRunner: 12/03/05 04:07:19 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now 12/03/05 04:07:19 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to output 12/03/05 04:07:19 INFO mapred.LocalJobRunner: reduce > reduce 12/03/05 04:07:19 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done. 12/03/05 04:07:20 INFO mapred.JobClient: map 100% reduce 100% 12/03/05 04:07:20 INFO mapred.JobClient: Job complete: job_local_0001 12/03/05 04:07:20 INFO mapred.JobClient: Counters: 12 12/03/05 04:07:20 INFO mapred.JobClient: FileSystemCounters 12/03/05 04:07:20 INFO mapred.JobClient: FILE_BYTES_READ=325874 12/03/05 04:07:20 INFO mapred.JobClient: FILE_BYTES_WRITTEN=356160 12/03/05 04:07:20 INFO mapred.JobClient: Map-Reduce Framework 12/03/05 04:07:20 INFO mapred.JobClient: Reduce input groups=383 12/03/05 04:07:20 INFO mapred.JobClient: Combine output records=383 12/03/05 04:07:20 INFO mapred.JobClient: Map input records=75 12/03/05 04:07:20 INFO mapred.JobClient: Reduce shuffle bytes=0 12/03/05 04:07:20 INFO mapred.JobClient: Reduce output records=383 12/03/05 04:07:20 INFO mapred.JobClient: Spilled Records=766 12/03/05 04:07:20 INFO mapred.JobClient: Map output bytes=6912 12/03/05 04:07:20 INFO mapred.JobClient: Combine input records=663 12/03/05 04:07:20 INFO mapred.JobClient: Map output records=663 12/03/05 04:07:20 INFO mapred.JobClient: Reduce input records=383 |
OK ! 。
look at the result
$ cat part-r-00000
"Glory 1
"Grandiose 1
"I 1
"Putin 1
"Putinism", 1
"These 1
"We 4
"every 1
"the 1
"unfair 1
"would 1
'victory' 2
(14:00 1
- 1
-------------------------------------------------------------------------------- 1
17%. 1
18:00 1
2008 1
58.3% 1
6,000 1
60% 2
62.3%. 1
64%, 1
Alexey 1
Analysis 1
BBC 1
BBC: 1
Bridget 1
But 2
Continue 2
December's 1
December, 1
Diplomatic 1
Dmitry 1
ElectionRussia 1
"Glory 1
"Grandiose 1
"I 1
"Putin 1
"Putinism", 1
"These 1
"We 4
"every 1
"the 1
"unfair 1
"would 1
'victory' 2
(14:00 1
- 1
-------------------------------------------------------------------------------- 1
17%. 1
18:00 1
2008 1
58.3% 1
6,000 1
60% 2
62.3%. 1
64%, 1
Alexey 1
Analysis 1
BBC 1
BBC: 1
Bridget 1
But 2
Continue 2
December's 1
December, 1
Diplomatic 1
Dmitry 1
ElectionRussia 1