TermDocumentMatrix कभी-कभी त्रुटि

फेंक रहा है मैं विभिन्न अलग-अलग स्पोर्ट्स टीमों के ट्वीट्स के आधार पर वर्ड क्लाउड बना रहा हूं। इस कोड को 10 बार में सफलतापूर्वक के बारे में 1 कार्यान्वित:TermDocumentMatrix कभी-कभी त्रुटि

Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : 
    'i, j, v' different lengths 
In addition: Warning messages: 
1: In mclapply(unname(content(x)), termFreq, control) : 
    all scheduled cores encountered errors in user code 
2: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : 
    NAs introduced by coercion

कोई भी विचार लोग:

handle <- 'arsenal' 
txt <- searchTwitter(handle,n=1000,lang='en') 
t <- sapply(txt,function(x) x$getText()) 
t <- gsub('http.*\\s*|RT|Retweet','',t) 
t <- gsub(handle,'',t) 
t_c <- Corpus(VectorSource(t)) 
tdm = TermDocumentMatrix(t_c,control = list(removePunctuation = TRUE,stopwords = stopwords("english"),removeNumbers = TRUE, content_transformer(tolower))) 
m = as.matrix(tdm) 
word_freqs = sort(rowSums(m), decreasing=TRUE) 
dm = data.frame(word=names(word_freqs), freq=word_freqs) 
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2"),rot.per=0.5)

अन्य 9 10 बार से बाहर है, यह निम्न त्रुटि फेंकता है? मैंने गुमराह किया है, लेकिन अब तक कम हो गया है! ध्यान रखें मैं आर में एक पूर्ण नौसिखिया हूँ!

स्रोत

2014-09-06 Dan

तो बाद चारों ओर खेलने का एक सा, कोड की निम्न पंक्ति पूरी तरह से मेरी समस्या का समाधान हो गया है:

t <- iconv(t,to="utf-8-mac")

स्रोत

2014-09-06 10:59:32 Dan

यह मेरी समस्या को तुरंत ठीक कर सकता है (मैक पर चल रहा है)। – timothyjgraham

मैं तुम्हें कहीं DocumentTermMatrix आदेश का उपयोग करने से पहले निम्न कोड पंक्ति का इस्तेमाल किया है लगता है।

corpus = tm_map(corpus, PlainTextDocument)

कोड की यह पंक्ति PlainTextDocument लिए कोष, जिस पर DocumentTermMatrix समारोह ठीक से काम नहीं करता है के सभी पाठ बदल देता है।

बस कॉर्पस बनाने की पूरी प्रक्रिया दोहराएं और उपर्युक्त आदेश को छोड़कर प्रीप्रोकैसिंग करें और आपको जाना अच्छा होगा।

स्रोत

2017-05-08 13:25:35

यह मेरी समस्या हल हो गया। –

यदि आप निकालें:

t_c <- Corpus(VectorSource(t))

तो फिर तुम TermDocumentMatrix के लिए सही उत्पादन मिल जाएगा:

corpus = tm_map(corpus, PlainTextDocument)

आप भी निकालना होगा।

स्रोत

2018-01-29 12:37:44 kalpesh

TermDocumentMatrix कभी-कभी त्रुटि

उत्तर

संबंधित मुद्दे