Python XML: Expat, StartElementHandlerInvestigate ways to parse XML. Use Expat, in xml.parsers.expat.
XML. Many programs require XML support. XML is a markup language that is commonly used for configuration and data files. In Python, several modules offer XML support. They have special features and advantages. We test Expat, one XML library.
To begin, we add the xml.parsers.expat module with an import statement. In this example, we add start tags, and element data, to a list. We assign the StartElementHandler and CharacterDataHandler to lambda expressions.
Tip A def-method name could instead be used. Lambda expressions may be sufficient if you need just one statement.
Info For StartElementHandler, we append the tag name to the list. For CharacterDataHandler, we append the data.
And This yields a list containing start element names, and the contents of those elements.
Note Many other handlers, including EndElementHandler and CommentHandler are available. Please see the Python documentation.
Newlines. Please notice how newlines are treated as character data. This is not the ideal effect for most programs. Newlines could be ignored, or filtered out of the list with helper methods. This would yield a better data model.
Python program that uses xml.parsers.expat
import xml.parsers.expat # Will store tag names and char data. list = [] # Create the parser. parser = xml.parsers.expat.ParserCreate() # Specify handlers. parser.StartElementHandler = lambda name, attrs: list.append(name) parser.CharacterDataHandler = lambda data: list.append(data) # Parse a string. parser.Parse("""<?xml version="1.0"?> <item><name>Sam</name> <name>Mark</name> </item>""", True) # Print the items in our list. print(list)
['item', 'name', 'Sam', '\n', 'name', 'Mark', '\n']
Discussion. Expat is not a Python technology. It is an older XML library created by James Clark in 1998. Written in C, it has excellent performance: it is noted as a "fast" parser. It does no validation. Generally, raw C code outperforms Python code.
So For performance, Expat is a good choice. It may be harder to use than other solutions. This is a tradeoff you must evaluate.
Summary. Nearly every developer will encounter XML files and need to parse them. No one way is ideal. A custom string-based parser, written in Python, is sometimes a good choice. A regular expression, with re, may be effective.
re.match, search
But A C-based, optimized XML parser like Expat is likely one of the fastest options. It requires less testing: it is already developed.
© 2007-2021 sam allen.
see site info on the changelog.