इनपुट
$ cat test.csv
ID,Task,label,Text
1,Collect Information,no response,cozily married practical athletics Mr. Brown flat
2,New Credit,no response,active married expensive soccer Mr. Chang flat
3,Collect Information,response,healthy single expensive badminton Mrs. Green flat
4,Collect Information,response,cozily married practical soccer Mr. Brown hierachical
5,Collect Information,response,cozily single practical badminton Mr. Brown flat
टी एल; डॉ
>>> from nltk import word_tokenize, pos_tag, pos_tag_sents
>>> import pandas as pd
>>> df = pd.read_csv('test.csv', sep=',')
>>> df['Text']
0 cozily married practical athletics Mr. Brown flat
1 active married expensive soccer Mr. Chang flat
2 healthy single expensive badminton Mrs. Green ...
3 cozily married practical soccer Mr. Brown hier...
4 cozily single practical badminton Mr. Brown flat
Name: Text, dtype: object
>>> texts = df['Text'].tolist()
>>> tagged_texts = pos_tag_sents(map(word_tokenize, texts))
>>> tagged_texts
[[('cozily', 'RB'), ('married', 'JJ'), ('practical', 'JJ'), ('athletics', 'NNS'), ('Mr.', 'NNP'), ('Brown', 'NNP'), ('flat', 'JJ')], [('active', 'JJ'), ('married', 'VBD'), ('expensive', 'JJ'), ('soccer', 'NN'), ('Mr.', 'NNP'), ('Chang', 'NNP'), ('flat', 'JJ')], [('healthy', 'JJ'), ('single', 'JJ'), ('expensive', 'JJ'), ('badminton', 'NN'), ('Mrs.', 'NNP'), ('Green', 'NNP'), ('flat', 'JJ')], [('cozily', 'RB'), ('married', 'JJ'), ('practical', 'JJ'), ('soccer', 'NN'), ('Mr.', 'NNP'), ('Brown', 'NNP'), ('hierachical', 'JJ')], [('cozily', 'RB'), ('single', 'JJ'), ('practical', 'JJ'), ('badminton', 'NN'), ('Mr.', 'NNP'), ('Brown', 'NNP'), ('flat', 'JJ')]]
>>> df['POS'] = tagged_texts
>>> df
ID Task label \
0 1 Collect Information no response
1 2 New Credit no response
2 3 Collect Information response
3 4 Collect Information response
4 5 Collect Information response
Text \
0 cozily married practical athletics Mr. Brown flat
1 active married expensive soccer Mr. Chang flat
2 healthy single expensive badminton Mrs. Green ...
3 cozily married practical soccer Mr. Brown hier...
4 cozily single practical badminton Mr. Brown flat
POS
0 [(cozily, RB), (married, JJ), (practical, JJ),...
1 [(active, JJ), (married, VBD), (expensive, JJ)...
2 [(healthy, JJ), (single, JJ), (expensive, JJ),...
3 [(cozily, RB), (married, JJ), (practical, JJ),...
4 [(cozily, RB), (single, JJ), (practical, JJ), ...
लांग में:
सबसे पहले, आप स्ट्रिंग की एक सूची के लिए Text
स्तंभ निकाल सकते हैं:
texts = df['Text'].tolist()
तो फिर तुम word_tokenize
समारोह लागू कर सकते हैं:
map(word_tokenize, texts)
ध्यान दें कि, @ बोउड का सुझाव लगभग df.apply
:
का उपयोग करके लगभग वही है
तो फिर तुम स्ट्रिंग की सूची की एक सूची में tokenized पाठ डंप:
df['Text'].apply(word_tokenize).tolist()
तो फिर तुम pos_tag_sents
उपयोग कर सकते हैं:
pos_tag_sents(df['Text'].apply(word_tokenize).tolist())
तो फिर तुम स्तंभ वापस DataFrame में जोड़ें:
df['POS'] = pos_tag_sents(df['Text'].apply(word_tokenize).tolist())
आपके पास कितनी पंक्तियां हैं? – alvas
20,000 पंक्तियां पंक्तियों की संख्या – mobcdi
यह कोई समस्या नहीं है। बस कॉलिंग की सूची के रूप में कॉलम निकालें, इसे संसाधित करें और फिर डेटाफ्रेम पर कॉलम में वापस जोड़ें। – alvas