Personal tools

Convert Python XML to List of Dictionaries using built-in data types and functions

Parsing a basic XML document using only Python's built-in data types and functions





There are a few reasons for resorting to Python's built-in functions and data types to parse an XML document.

  • Your Python build was not compiled with the XML libraries.
  • You're working within an application, such as Zope, which restricts the python libraries that can be imported for use.
  • Your XML source is very basic (no element attributes, 2 levels deep). You dont' have a need for advanced parsing.
  • Performance

 

The commented code below will work for the most basic of XML documents. Documents with element attributes, or that are more than 2 levels deep will not parse correctly.

Returned, will be a list of dictionaries. Each element in the list will correspond to one record in the XML structure, while each key in the dictionary will correspond to elements with each record.

You'll most likely use the code below to create a function, where the xmlSource, sortKey and sortOrder can be passed in as arguments.

 

xmlSource = "" # define your XML source here

#define the parent element, and
elementMap = {'parent' : 'record',
                       'children' : (
                            'created',
                            'description',
                            'subject',
                            'body')
              }

sortKey = 'created-at'
sortOrder = 'asc'             # asc for Ascending, desc for Descending

# turn the xmlSource string into a list, using the 'parent' element as a delimitter
parents = xmlSource.split('<' + elementMap['parent'] + '>')

# remove first element in list, as it contains only encoding and body or wrapper element information
parents.pop(0)

# instantiate list and contained dictionary element
xmlList = []
xmlDict = {}

# performance - "grab" children key from elementMap, so it doesn't need to be retrieved from the elementMap at each loop iteration
children = elementMap['children']

# loop through parent element list
for parent in parents:

    #clear the dictionary so it can be repopulated by the next parent record
    xmlDict = {}
   
    # identifying children and storing them to the xmlDict with each iteration
    for child in children:
       
        # find child start tag
        open = parent.find('<' + child)
       
        # if child tag cannot be found, add it to the xmlDict with a value of None
        # if child is found, attempt to locate closing tag
        if open == -1:
            xmlDict[child] = None
        else:
            # find > to complete the open tag, skipping any attributes of the child element
            open = parent.find('>', open)
           
            closed = parent.find('</' + child + '>')
           
            # if closing tag cannot be found, add child to the xmlDict with a value of None
            # if closing tag is found, add child to the xmlDict with data contained
            if closed == -1:
                xmlDict[child] = None
            else:
                # located contained data and store to xmlDict
                childData = parent[open + 1 : closed]
                xmlDict[child] = childData
   
    # finally, append the xmlDict dictionary to the xmlList
    xmlList.append(xmlDict)
   

# sort the list of dictionaries according to the sortKey and sortOrder
if sortKey:
    if sortOrder.lower() == 'desc':
        xmlList.sort (lambda x, y : cmp(y[sortKey], x[sortKey]))
    else:
        xmlList.sort (lambda x, y : cmp(x[sortKey], y[sortKey]))
       

return xmlList





Free link to your website when you post your code