Input data is in the following format
key is the imageid for which nearest neighbor will be computed
the value is 100 dimensional vector of floating point values separated by space or tab
The mapper reads in the query (the query is a 100 dimensional vector) and each line of the input and outputs a
where key2 is a floating point value indicating the distance, and value2 is the imageid
The number of reducers is set to 1. And the reducer is set to be the identity reducer.
I tried to use the following command
bin/hadoop jar ./mapred/contrib/streaming/
This is the output stream is as below. The failure is in the mapper itself, more specifically the TEXTOUTPUTREADER. I am not sure how to fix this. The logs are attached below:
11/04/13 13:22:15 INFO security.Groups: Group mapping impl=org.apache.hadoop.
11/04/13 13:22:15 WARN conf.Configuration: mapred.used.
STREAM: addTaskEnvironment=
STREAM: shippedCanonFiles_=[]
STREAM: shipped: false /usr/local/hadoop/file1
STREAM: cmd=file1
STREAM: cmd=null
STREAM: shipped: false /usr/local/hadoop/org.apache.
STREAM: cmd=org.apache.hadoop.mapred.
11/04/13 13:22:15 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
STREAM: Found runtime classes in: /usr/local/hadoop-hadoop/
packageJobJar: [/usr/local/hadoop-hadoop/
JarBuilder.addNamedStream META-INF/MANIFEST.MF
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/typedbytes/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
JarBuilder.addNamedStream org/apache/hadoop/streaming/
STREAM: ==== JobConf properties:
STREAM: dfs.block.access.key.update.
STREAM: dfs.block.access.token.enable=
STREAM: dfs.block.access.token.
STREAM: dfs.blockreport.initialDelay=0
STREAM: dfs.blockreport.intervalMsec=
STREAM: dfs.blocksize=67108864
STREAM: dfs.bytes-per-checksum=512
STREAM: dfs.client-write-packet-size=
STREAM: dfs.client.block.write.
STREAM: dfs.client.https.keystore.
STREAM: dfs.client.https.need-auth=
STREAM: dfs.datanode.address=0.0.0.0:
STREAM: dfs.datanode.balance.
STREAM: dfs.datanode.data.dir=file://$
STREAM: dfs.datanode.data.dir.perm=755
STREAM: dfs.datanode.directoryscan.
STREAM: dfs.datanode.directoryscan.
STREAM: dfs.datanode.dns.interface=
STREAM: dfs.datanode.dns.nameserver=
STREAM: dfs.datanode.du.reserved=0
STREAM: dfs.datanode.failed.volumes.
STREAM: dfs.datanode.handler.count=3
STREAM: dfs.datanode.http.address=0.0.
STREAM: dfs.datanode.https.address=0.
STREAM: dfs.datanode.ipc.address=0.0.
STREAM: dfs.default.chunk.view.size=
STREAM: dfs.heartbeat.interval=3
STREAM: dfs.https.enable=false
STREAM: dfs.https.server.keystore.
STREAM: dfs.namenode.accesstime.
STREAM: dfs.namenode.backup.address=0.
STREAM: dfs.namenode.backup.http-
STREAM: dfs.namenode.checkpoint.dir=
STREAM: dfs.namenode.checkpoint.edits.
STREAM: dfs.namenode.checkpoint.
STREAM: dfs.namenode.checkpoint.size=
STREAM: dfs.namenode.decommission.
STREAM: dfs.namenode.decommission.
STREAM: dfs.namenode.delegation.key.
STREAM: dfs.namenode.delegation.token.
STREAM: dfs.namenode.delegation.token.
STREAM: dfs.namenode.edits.dir=${dfs.
STREAM: dfs.namenode.handler.count=10
STREAM: dfs.namenode.http-address=0.0.
STREAM: dfs.namenode.https-address=0.
STREAM: dfs.namenode.logging.level=
STREAM: dfs.namenode.max.objects=0
STREAM: dfs.namenode.name.dir=file://$
STREAM: dfs.namenode.replication.
STREAM: dfs.namenode.replication.
STREAM: dfs.namenode.replication.min=1
STREAM: dfs.namenode.safemode.
STREAM: dfs.namenode.safemode.
STREAM: dfs.namenode.secondary.http-
STREAM: dfs.permissions.enabled=true
STREAM: dfs.permissions.
STREAM: dfs.replication=1
STREAM: dfs.replication.max=512
STREAM: dfs.stream-buffer-size=4096
STREAM: dfs.web.ugi=webuser,webgroup
STREAM: file.blocksize=67108864
STREAM: file.bytes-per-checksum=512
STREAM: file.client-write-packet-size=
STREAM: file.replication=1
STREAM: file.stream-buffer-size=4096
STREAM: fs.AbstractFileSystem.file.
STREAM: fs.AbstractFileSystem.hdfs.
STREAM: fs.automatic.close=true
STREAM: fs.checkpoint.dir=${hadoop.
STREAM: fs.checkpoint.edits.dir=${fs.
STREAM: fs.checkpoint.period=3600
STREAM: fs.checkpoint.size=67108864
STREAM: fs.defaultFS=hdfs://localhost:
STREAM: fs.df.interval=60000
STREAM: fs.file.impl=org.apache.
STREAM: fs.ftp.impl=org.apache.hadoop.
STREAM: fs.har.impl=org.apache.hadoop.
STREAM: fs.har.impl.disable.cache=true
STREAM: fs.hdfs.impl=org.apache.
STREAM: fs.hftp.impl=org.apache.
STREAM: fs.hsftp.impl=org.apache.
STREAM: fs.kfs.impl=org.apache.hadoop.
STREAM: fs.ramfs.impl=org.apache.
STREAM: fs.s3.block.size=67108864
STREAM: fs.s3.buffer.dir=${hadoop.tmp.
STREAM: fs.s3.impl=org.apache.hadoop.
STREAM: fs.s3.maxRetries=4
STREAM: fs.s3.sleepTimeSeconds=10
STREAM: fs.s3n.block.size=67108864
STREAM: fs.s3n.impl=org.apache.hadoop.
STREAM: fs.trash.interval=0
STREAM: ftp.blocksize=67108864
STREAM: ftp.bytes-per-checksum=512
STREAM: ftp.client-write-packet-size=
STREAM: ftp.replication=3
STREAM: ftp.stream-buffer-size=4096
STREAM: hadoop.common.configuration.
STREAM: hadoop.hdfs.configuration.
STREAM: hadoop.logfile.count=10
STREAM: hadoop.logfile.size=10000000
STREAM: hadoop.rpc.socket.factory.
STREAM: hadoop.security.
STREAM: hadoop.security.authorization=
STREAM: hadoop.tmp.dir=/usr/local/
STREAM: hadoop.util.hash.type=murmur
STREAM: io.bytes.per.checksum=512
STREAM: io.compression.codecs=org.
STREAM: io.file.buffer.size=4096
STREAM: io.map.index.skip=0
STREAM: io.mapfile.bloom.error.rate=0.
STREAM: io.mapfile.bloom.size=1048576
STREAM: io.native.lib.available=true
STREAM: io.seqfile.compress.blocksize=
STREAM: io.seqfile.lazydecompress=true
STREAM: io.seqfile.local.dir=${hadoop.
STREAM: io.seqfile.sorter.recordlimit=
STREAM: io.serializations=org.apache.
STREAM: io.skip.checksum.errors=false
STREAM: ipc.client.connect.max.
STREAM: ipc.client.connection.
STREAM: ipc.client.idlethreshold=4000
STREAM: ipc.client.kill.max=10
STREAM: ipc.client.tcpnodelay=false
STREAM: ipc.server.listen.queue.size=
STREAM: ipc.server.tcpnodelay=false
STREAM: kfs.blocksize=67108864
STREAM: kfs.bytes-per-checksum=512
STREAM: kfs.client-write-packet-size=
STREAM: kfs.replication=3
STREAM: kfs.stream-buffer-size=4096
STREAM: map.sort.class=org.apache.
STREAM: mapred.child.java.opts=-
STREAM: mapred.input.format.class=org.
STREAM: mapred.map.runner.class=org.
STREAM: mapred.mapper.class=org.
STREAM: mapred.output.format.class=
STREAM: mapred.reducer.class=org.
STREAM: mapreduce.client.completion.
STREAM: mapreduce.client.
STREAM: mapreduce.client.output.
STREAM: mapreduce.client.
STREAM: mapreduce.client.submit.file.
STREAM: mapreduce.cluster.local.dir=${
STREAM: mapreduce.cluster.temp.dir=${
STREAM: mapreduce.input.
STREAM: mapreduce.input.
STREAM: mapreduce.job.cache.symlink.
STREAM: mapreduce.job.committer.setup.
STREAM: mapreduce.job.complete.cancel.
STREAM: mapreduce.job.end-
STREAM: mapreduce.job.end-
STREAM: mapreduce.job.jar=/tmp/
STREAM: mapreduce.job.jvm.numtasks=1
STREAM: mapreduce.job.maps=2
STREAM: mapreduce.job.maxtaskfailures.
STREAM: mapreduce.job.output.key.
STREAM: mapreduce.job.output.value.
STREAM: mapreduce.job.queuename=
STREAM: mapreduce.job.reduce.
STREAM: mapreduce.job.reduces=1
STREAM: mapreduce.job.speculative.
STREAM: mapreduce.job.speculative.
STREAM: mapreduce.job.speculative.
STREAM: mapreduce.job.split.metainfo.
STREAM: mapreduce.job.userlog.retain.
STREAM: mapreduce.job.working.dir=
STREAM: mapreduce.jobtracker.address=
STREAM: mapreduce.jobtracker.expire.
STREAM: mapreduce.jobtracker.handler.
STREAM: mapreduce.jobtracker.
STREAM: mapreduce.jobtracker.http.
STREAM: mapreduce.jobtracker.
STREAM: mapreduce.jobtracker.
STREAM: mapreduce.jobtracker.
STREAM: mapreduce.jobtracker.maxtasks.
STREAM: mapreduce.jobtracker.persist.
STREAM: mapreduce.jobtracker.persist.
STREAM: mapreduce.jobtracker.persist.
STREAM: mapreduce.jobtracker.restart.
STREAM: mapreduce.jobtracker.
STREAM: mapreduce.jobtracker.staging.
STREAM: mapreduce.jobtracker.system.
STREAM: mapreduce.jobtracker.