Apache Nutch won't execute crawl

Einfach Dinge, die nichts mit XAMPP, Apache Friends, Apache, MySQL, PHP und alle dem zu tun haben. Allerlei halt. ;)

Apache Nutch won't execute crawl

Postby jerrittpace » 26. December 2015 21:32

I am getting an error in Nutch 1.11, 1.5.1, and 2.3 when I try to execute a crawl command,such as

bin/crawl C:/Users/User5/Documents/Nutch/apache-nutch-2.3/runtime/local/urls solr.server.url=http://localhost:8983/solr/collections1 urls/ 2

I have my java classpath set, and nutch is running, ie i get response from $ bin/nutch, bu I get the same error regardless of what version of nutch I am using:

Error running:
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-2.3/runtime/local/bin/nutch inject C:/Users/User5/Documents/Nutch/apache-nutch-2.3/runtime/local/urls -crawlId solr.server.url=http://localhost:8983/solr/collections1
Failed with exit value 127.


The full readout is:


Code: Select all
$ bin/crawl C:/Users/User5/Documents/Nutch/apache-nutch-2.3/runtime/local/urls solr.server.url=http://localhost:8983/solr/collections1 urls/ 2
Injecting seed URLs
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-2.3/runtime/local/bin/nutch inject C:/Users/User5/Documents/Nutch/apache-nutch-2.3/runtime/local/urls -crawlId solr.server.url=http://localhost:8983/solr/collections1
InjectorJob: starting at 2015-12-26 15:21:26
InjectorJob: Injecting urlDir: C:/Users/User5/Documents/Nutch/apache-nutch-2.3/runtime/local/urls
InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora storage class.
InjectorJob: java.io.IOException: Failed to set permissions of path: \tmp\hadoop-User5\mapred\staging\User52078840406\.staging to 0700
        at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
        at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
        at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
        at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:231)
        at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)

Error running:
  /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-2.3/runtime/local/bin/nutch inject C:/Users/User5/Documents/Nutch/apache-nutch-2.3/runtime/local/urls -crawlId solr.server.url=http://localhost:8983/solr/collections1
Failed with exit value 127.


I have tried a lot of variations of how to define the crawl directory, and I am not sure that that is the problem.

Am I maybe having a problem with the Java installation?

Any help with this would be very much appreciated!!
jerrittpace
 
Posts: 26
Joined: 14. November 2015 18:54
Operating System: windows

Return to Allerlei

Who is online

Users browsing this forum: No registered users and 12 guests