Convert Python XML to List of Dictionaries using built-in data types and functions

Parsing a basic XML document using only Python’s built-in data types and functions.

There are a few reasons for resorting to Python’s built-in functions and data types to parse an XML document.

  • Your Python build was not compiled with the XML libraries.
  • You’re working within an application, such as Zope, which restricts the python libraries that can be imported for use.
  • Your XML source is very basic (no element attributes, 2 levels deep). You dont’ have a need for advanced parsing.
  • Performance

 

The commented code below will work for the most basic of XML documents. Documents with element attributes, or that are more than 2 levels deep will not parse correctly.

Returned, will be a list of dictionaries. Each element in the list will correspond to one record in the XML structure, while each key in the dictionary will correspond to elements with each record.

You’ll most likely use the code below to create a function, where the xmlSource, sortKey and sortOrder can be passed in as arguments.

 

xmlSource = "" # define your XML source here

#define the parent element, and
elementMap = {'parent' : 'record',
'children' : (
'created',
'description',
'subject',
'body')
}

sortKey = 'created-at'
sortOrder = 'asc'             # asc for Ascending, desc for Descending

# turn the xmlSource string into a list, using the 'parent' element as a delimitter
parents = xmlSource.split('<' + elementMap['parent'] + '>')

# remove first element in list, as it contains only encoding and body or wrapper element information
parents.pop(0)

# instantiate list and contained dictionary element
xmlList = []
xmlDict = {}

# performance - "grab" children key from elementMap, so it doesn't need to be retrieved from the elementMap at each loop iteration
children = elementMap['children']

# loop through parent element list
for parent in parents:

#clear the dictionary so it can be repopulated by the next parent record
xmlDict = {}

# identifying children and storing them to the xmlDict with each iteration
for child in children:

# find child start tag
open = parent.find('<' + child)

# if child tag cannot be found, add it to the xmlDict with a value of None
# if child is found, attempt to locate closing tag
if open == -1:
xmlDict[child] = None
else:
# find > to complete the open tag, skipping any attributes of the child element
open = parent.find('>', open)

closed = parent.find('</' + child + '>')

# if closing tag cannot be found, add child to the xmlDict with a value of None
# if closing tag is found, add child to the xmlDict with data contained
if closed == -1:
xmlDict[child] = None
else:
# located contained data and store to xmlDict
childData = parent[open + 1 : closed]
xmlDict[child] = childData

# finally, append the xmlDict dictionary to the xmlList
xmlList.append(xmlDict)


# sort the list of dictionaries according to the sortKey and sortOrder
if sortKey:
if sortOrder.lower() == 'desc':
xmlList.sort (lambda x, y : cmp(y[sortKey], x[sortKey]))
else:
xmlList.sort (lambda x, y : cmp(x[sortKey], y[sortKey]))


return xmlList