स्कैला में एन-ग्राम कैसे उत्पन्न करें?

मैं स्कैला में एन-ग्राम के आधार पर पृथक प्रेस एल्गोरिदम कोड करने की कोशिश कर रहा हूं। बड़ी फ़ाइलों के लिए एन-ग्राम कैसे उत्पन्न करें: उदाहरण के लिए, "मधुमक्खी मधुमक्खियों का मधुमक्खी" वाली फ़ाइल के लिए।स्कैला में एन-ग्राम कैसे उत्पन्न करें?

सबसे पहले इसे यादृच्छिक एन-ग्राम चुनना होगा। उदाहरण के लिए, मधुमक्खी।
फिर इसे एन-ग्राम (एन -1) शब्दों से शुरू करना है। उदाहरण के लिए, मधुमक्खी।
यह इस एन-ग्राम के अंतिम शब्द को प्रिंट करता है। फिर दोहराना।

क्या आप कृपया मुझे कुछ संकेत दे सकते हैं कि यह कैसे करें? असुविधा के लिए खेद है।

स्रोत

2011-11-24 user1002579

मैं नहीं जानता कि क्या एक एन-ग्राम है। क्या आप सिर्फ यादृच्छिक शब्दों को चुन रहे हैं? या कुछ तर्क है? – santiagobasulto

@santiagobasulto विकिपीडिया आपका मित्र है: http://en.wikipedia.org/wiki/N-gram –

क्या यह http://stackoverflow.com/questions/8256830/how-to-make-string से संबंधित किसी भी मौके से है -sequence-इन-स्केला? –

आपके प्रश्न थोड़ा अधिक विशिष्ट हो सकते हैं लेकिन यहां मेरी कोशिश है।

val words = "the bee is the bee of the bees" 
words.split(' ').sliding(2).foreach(p => println(p.mkString))

स्रोत

2011-11-24 15:08:46 peri4n

यह नहीं कि यह आपको केवल 2-ग्राम देगा। यदि एन-ग्राम वांछित हैं, तो n को पैरामीटरकृत करने की आवश्यकता है। – tuxdna

आप n

val words = "the bee is the bee of the bees" 
val w = words.split(" ") 

val n = 4 
val ngrams = (for(i <- 1 to n) yield w.sliding(i).map(p => p.toList)).flatMap(x => x) 
ngrams foreach println 

List(the) 
List(bee) 
List(is) 
List(the) 
List(bee) 
List(of) 
List(the) 
List(bees) 
List(the, bee) 
List(bee, is) 
List(is, the) 
List(the, bee) 
List(bee, of) 
List(of, the) 
List(the, bees) 
List(the, bee, is) 
List(bee, is, the) 
List(is, the, bee) 
List(the, bee, of) 
List(bee, of, the) 
List(of, the, bees) 
List(the, bee, is, the) 
List(bee, is, the, bee) 
List(is, the, bee, of) 
List(the, bee, of, the) 
List(bee, of, the, bees)

स्रोत

2013-05-24 09:58:58 tuxdna

यहाँ की एक पैरामीटर के साथ इस कोशिश कर सकते हैं एक धारा आधारित दृष्टिकोण है। एन-ग्राम की गणना करते समय इसे बहुत अधिक स्मृति की आवश्यकता नहीं होगी।

object ngramstream extends App { 

    def process(st: Stream[Array[String]])(f: Array[String] => Unit): Stream[Array[String]] = st match { 
    case x #:: xs => { 
     f(x) 
     process(xs)(f) 
    } 
    case _ => Stream[Array[String]]() 
    } 

    def ngrams(n: Int, words: Array[String]) = { 
    // exclude 1-grams 
    (2 to n).map { i => words.sliding(i).toStream } 
     .foldLeft(Stream[Array[String]]()) { 
     (a, b) => a #::: b 
     } 
    } 

    val words = "the bee is the bee of the bees" 
    val n = 4 
    val ngrams2 = ngrams(n, words.split(" ")) 

    process(ngrams2) { x => 
    println(x.toList) 
    } 

}

उत्पादन:

List(the, bee) 
List(bee, is) 
List(is, the) 
List(the, bee) 
List(bee, of) 
List(of, the) 
List(the, bees) 
List(the, bee, is) 
List(bee, is, the) 
List(is, the, bee) 
List(the, bee, of) 
List(bee, of, the) 
List(of, the, bees) 
List(the, bee, is, the) 
List(bee, is, the, bee) 
List(is, the, bee, of) 
List(the, bee, of, the) 
List(bee, of, the, bees)

स्रोत

2013-12-17 12:48:58 tuxdna

मुझे यह पसंद है, 'प्रक्रिया' की उपयोगिता सुनिश्चित नहीं है। क्यों न केवल 'ngrams (...)। Foreach (x => println (x.toList)) '? – Mortimer

@ मॉर्टिमर: दिलचस्प सवाल। 'प्रक्रिया' सिर्फ एक अतिरिक्त कार्य है। हम निश्चित रूप से 'ngrams2 foreach {x => println (x.toList)}' का उपयोग कर सकते हैं। धन्यवाद :-) – tuxdna

स्कैला में एन-ग्राम कैसे उत्पन्न करें?

उत्तर

संबंधित मुद्दे