lxml

के साथ विशेषता द्वारा तत्व ढूंढना मुझे कुछ डेटा निकालने के लिए एक XML फ़ाइल को पार्स करने की आवश्यकता है।lxml

<root> 
    <articles> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
     <article type="info"> 
      <content>some text</content> 
     </article> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
    </articles> 
</root>

यहाँ मैं प्रकार "समाचार" के साथ ही लेख प्राप्त करना चाहते हैं: मैं केवल कुछ विशेषताओं के साथ कुछ तत्वों की जरूरत है, यहाँ दस्तावेज़ का एक उदाहरण है। एलएक्सएमएल के साथ ऐसा करने का सबसे कुशल और सुरुचिपूर्ण तरीका क्या है?

मुझे लगता विधि के साथ की कोशिश की लेकिन यह बहुत अच्छा नहीं है:

from lxml import etree 
f = etree.parse("myfile") 
root = f.getroot() 
articles = root.getchildren()[0] 
article_list = articles.findall('article') 
for article in article_list: 
    if "type" in article.keys(): 
     if article.attrib['type'] == 'news': 
      content = article.find('content') 
      content = content.text

स्रोत

2011-02-23 Jérôme Pigeot

आप xpath का उपयोग कर सकते हैं, उदाहरण के root.xpath("//article[@type='news']")

यह xpath अभिव्यक्ति मूल्य "समाचार" के साथ "प्रकार" विशेषताओं वाले सभी <article/> तत्वों की एक सूची लौटाएगी। इसके बाद आप जो भी चाहते हैं उसे करने के लिए इसे फिर से चालू कर सकते हैं, या इसे कहीं भी पास कर सकते हैं।

सिर्फ पाठ सामग्री पाने के लिए आपको इतना तरह xpath का विस्तार कर सकते हैं:

root = etree.fromstring(""" 
<root> 
    <articles> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
     <article type="info"> 
      <content>some text</content> 
     </article> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
    </articles> 
</root> 
""") 

print root.xpath("//article[@type='news']/content/text()")

और इस वसीयत उत्पादन ['some text', 'some text']। या यदि आप केवल सामग्री तत्व चाहते थे, तो यह "//article[@type='news']/content" होगा - और इसी तरह।

स्रोत

2011-02-23 15:36:09

बस संदर्भ के लिए, आप findall के साथ एक ही परिणाम प्राप्त कर सकते हैं:

root = etree.fromstring(""" 
<root> 
    <articles> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
     <article type="info"> 
      <content>some text</content> 
     </article> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
    </articles> 
</root> 
""") 

articles = root.find("articles") 
article_list = articles.findall("article[@type='news']/content") 
for a in article_list: 
    print a.text

स्रोत

2015-02-02 10:09:55 Kjir

उत्तर

संबंधित मुद्दे