Investigate ways to parse XML. Use Expat, in xml.parsers.expat.
XML. Many programs require XML support. XML is a markup language that is commonly used for configuration and data files. In Python, several modules offer XML support. They have special features and advantages. We test Expat, one XML library.
To begin, we add the xml.parsers.expat module with an import statement. In this example, we add start tags, and element data, to a list. We assign the StartElementHandler and CharacterDataHandler to lambda expressions.
Tip: A def-method name could instead be used. Lambda expressions may be sufficient if you need just one statement.
Python program that uses xml.parsers.expat
# Will store tag names and char data.
list = 
# Create the parser.
parser = xml.parsers.expat.ParserCreate()
# Specify handlers.
parser.StartElementHandler = lambda name, attrs: list.append(name)
parser.CharacterDataHandler = lambda data: list.append(data)
# Parse a string.
# Print the items in our list.
['item', 'name', 'Sam', '\n', 'name', 'Mark', '\n']
Newlines. Please notice how newlines are treated as character data. This is not the ideal effect for most programs. Newlines could be ignored, or filtered out of the list with helper methods. This would yield a better data model.
Discussion. Expat is not a Python technology. It is an older XML library created by James Clark in 1998. Written in C, it has excellent performance: it is noted as a "fast" parser. It does no validation. Generally, raw C code outperforms Python code.
So: For performance, Expat is a good choice. It may be harder to use than other solutions. This is a tradeoff you must evaluate.
Summary. Nearly every developer will encounter XML files and need to parse them. No one way is ideal. A custom string-based parser, written in Python, is sometimes a good choice. A regular expression, with re, may be effective.re.match, search
But: A C-based, optimized XML parser like Expat is likely one of the fastest options. It requires less testing: it is already developed.