पांडा डेटाफ्रेम str.contains() और ऑपरेशन

डीएफ (पांडस डेटाफ्रेम) में तीन पंक्तियां हैं।पांडा डेटाफ्रेम str.contains() और ऑपरेशन

some_col_name 
"apple is delicious" 
"banana is delicious" 
"apple and banana both are delicious"

df.col_name.str.contains ("सेब | केला")

पंक्तियों के सभी पकड़ेगा:

"सेब स्वादिष्ट है", "केला स्वादिष्ट है", "सेब और केला दोनों स्वादिष्ट हैं"।

मैं str.contains विधि पर और ऑपरेटर कैसे लागू करूं, ताकि यह केवल तारों को पकड़ सके जिसमें दोनों सेब & केले शामिल हैं?

"apple and banana both are delicious"

मैं तार 10-20 अलग शब्द (अंगूर, तरबूज, बेर, नारंगी, ..., आदि) में शामिल है कि हड़पने करना चाहते हैं

स्रोत

2016-05-03 Aaron

आप क्या कर सकते हैं इस प्रकार है कि:

df[(df['col_name'].str.contains('apple')) & (df['col_name'].str.contains('banana'))]

स्रोत

2016-05-03 18:35:41 flyingmeatball

आप regex अभिव्यक्ति शैली में यह कर सकते हैं:

df[df['col_name'].str.contains(r'^(?=.*apple)(?=.*banana)')]

फिर आप एक regex स्ट्रिंग में शब्दों की अपनी सूची का निर्माण कर सकते इसलिए जैसे:

base = r'^{}' 
expr = '(?=.*{})' 
words = ['apple', 'banana', 'cat'] # example 
base.format(''.join(expr.format(w) for w in words))

प्रस्तुत करना होगा:

'^(?=.*apple)(?=.*banana)(?=.*cat)'

तो फिर तुम गतिशील अपना सामान कर सकते हैं।

स्रोत

2016-05-03 18:42:22 Anzel

इस regex प्रयास करें

apple.*banana|banana.*apple

कोड है:

import pandas as pd 

df = pd.DataFrame([[1,"apple is delicious"],[2,"banana is delicious"],[3,"apple and banana both are delicious"]],columns=('ID','String_Col')) 

print df[df['String_Col'].str.contains(r'apple.*banana|banana.*apple')]

आउटपुट

ID       String_Col 
2 3 apple and banana both are delicious

स्रोत

2016-05-03 18:54:50 pmaniyan

df = pd.DataFrame({'col': ["apple is delicious", 
          "banana is delicious", 
          "apple and banana both are delicious"]}) 

targets = ['apple', 'banana'] 

# Any word from `targets` are present in sentence. 
>>> df.col.apply(lambda sentence: any(word in sentence for word in targets)) 
0 True 
1 True 
2 True 
Name: col, dtype: bool 

# All words from `targets` are present in sentence. 
>>> df.col.apply(lambda sentence: all(word in sentence for word in targets)) 
0 False 
1 False 
2  True 
Name: col, dtype: bool

स्रोत

2016-05-03 19:57:46 Alexander

यदि आप कम से कम में पकड़ने के लिए चाहते हैं कम से कम वाक्य में दो शब्दों, हो सकता है यह काम करेंगे (@Alexander से टिप लेने):

target=['apple','banana','grapes','orange'] 
connector_list=['and'] 
df[df.col.apply(lambda sentence: (any(word in sentence for word in target)) & (all(connector in sentence for connector in connector_list)))]

उत्पादन:

        col 
2 apple and banana both are delicious

आप की तुलना में अधिक है, तो दो शब्दों को पकड़ने के लिए अल्पविराम के द्वारा अलग कर रहे हैं जो ',' के अलावा कोई

df[df.col.apply(lambda sentence: (any(word in sentence for word in target)) & (any(connector in sentence for connector in connector_list)))]

उत्पादन के लिए connector_list में जोड़ने और सभी से दूसरी शर्त को संशोधित:

         col 
2  apple and banana both are delicious 
3 orange,banana and apple all are delicious

स्रोत

2016-05-04 20:07:16

पांडा डेटाफ्रेम str.contains() और ऑपरेशन

उत्तर

संबंधित मुद्दे