मैं Nutch का उपयोग शुरू कर दिया है और सब कुछ ठीक था, जब तक मैं एक IOException
अपवाद का सामना करना पड़ा,Nutch का उपयोग कर ... क्रॉलिंग एक IOException शो
$ ./nutch crawl urls -dir myCrawl -depth 2 -topN 4
cygpath: can't convert empty path
solrUrl is not set, indexing will be skipped...
crawl started in: myCrawl
rootUrlDir = urls
threads = 10
depth = 2
solrUrl=null
topN = 4
Injector: starting at 2012-06-23 03:37:51
Injector: crawlDb: myCrawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Rahul\mapred\staging\Rahul255889423\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
@jeffery --- मैं डाउनग्रेड मेरी Nutch संस्करण n एक नई समस्या है, जो समझने के लिए .... Plzz मदद मेरी क्षेत्र से बाहर है ....
$ ./nutch crawl urls -dir myCrawl -depth 4 -topN 5
cygpath: can't convert empty path
solrUrl is not set, indexing will be skipped...
crawl started in: myCrawl
root UrlDir = urls
threads = 10
depth = 4
solrUrl=null
topN = 5
Injector: starting at 2012-06-23 22:30:28
Injector: crawlDb: myCrawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
समस्या यह tym क्या है का सामना करना पड़ा ???
नच/हडोप का कौन सा संस्करण आप उपयोग कर रहे हैं? – Jeffrey
नच-1.5 सोलर -03 –
एबीटी हूपॉप नहीं जानता। मैं नच का उपयोग करने में एक पूर्ण नोब हूं। :( –