How to do proper formatting of XML differences in dictionary using python 3 4 4

0 votes

Need help in formatting the output. Please help!!!

test1.xml

<?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
   <product description="Cardigan Sweater" product_image="cardigan.jpg">
      <catalog_item gender="Men's">
         <item_number>QWZ5671</item_number>
         <cool_number>QWZ5671</cool_number>
         <price>39.5</price>
         <size description="Medium">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
         </size>
      </catalog_item>
      <catalog_item gender="Women's">
         <item_number>RRX986</item_number>
         <price>42.50</price>
         <size description="Small">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="navy_cardigan.jpg">Nay</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burundy</color_swatch>
         </size>
      </catalog_item>
   </product>
</catalog>

test2.xml

<?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
   <product description="Cardigan Sweater" product_image="cardigan.jpg">
      <catalog_item gender="Men's">
         <item_number>QWZ5671</item_number>
         <cool_number>QWZ5671</cool_number>
         <price>39.5</price>
         <size description="Medium">
            <color_swatch image="red_cardigan.jpg">pink</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
         </size>
      </catalog_item>
      <catalog_item gender="Women's">
         <item_number>peac</item_number>
         <price>42.50</price>
         <size description="Small">
            <color_swatch image="red_cardigan.jpg">lost</color_swatch>
            <color_swatch image="navy_cardigan.jpg">pet</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">hey</color_swatch>
         </size>
      </catalog_item>
   </product>
</catalog>

I am having below code which will print the differences between two XML files in dictionary format but it is printing differences without filenames and differences in jumbled way

code:

from lxml import etree
from collections import defaultdict
import pprintpp
from pprintpp import ppprint as pp

root_1 = etree.parse('test1.xml').getroot()
root_2 = etree.parse('test2.xml').getroot()

d1, d2 = [], []
for node in root_1.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        if x.attrib:
            item[x.attrib.keys()[0]].append(x.attrib.values()[0])
        if x.text.strip():
            item[x.tag].append(x.text.strip())
    d1.append(dict(item))

for node in root_2.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        if x.attrib:
            item[x.attrib.keys()[0]].append(x.attrib.values()[0])
        if x.text.strip():
            item[x.tag].append(x.text.strip())
    d2.append(dict(item))

d1 = sorted(d1, key = lambda x: x['item_number'])
d2 = sorted(d2, key = lambda x: x['item_number'])

res_dict = defaultdict(list)
for x, y in zip(d1, d2):
    for key1, key2 in zip(x.keys(), y.keys()):
        if key1 == key2 and sorted(x[key1]) != sorted(y[key2]):
            res_dict[x['item_number'][0]].append({key1: list(set(x[key1]) ^ set(y[key2]))})

if res_dict == {}:
  print('Data is same in both XML files')
else:
  pp(dict(res_dict))

Current Output is coming like this if there are differences. No file names and jumbled differences.

{'QWZ5671': [{'color_swatch': ['Red', 'pink']}],
 'RRX986': [{'item_number': ['RRX986', 'peac']},
            {'color_swatch': ['hey', 'pet', 'Burundy', 'Nay', 'lost', 'Red']}]}

Expected output:

{'QWZ5671': [{'color_swatch': ['test1.xml': 'Red', 'test2.xml': 'pink']}],
 'RRX986': [{'item_number': ['test1.xml': 'RRX986', 'test2.xml': 'peac']},
            {'color_swatch': ['test1.xml':'Burundy, 'test2.xml':'hey'], 
                             ['test1.xml':'Nay', 'test2.xml':'pet'],
                             ['test1.xml': 'Red','test2.xml': 'lost']}]}




Jun 19, 2020 in Python by Abhinandan
• 120 points

edited Jun 19, 2020 by Abhinandan 506 views
Hi, @Abhinandan,

Did you face any error while executing your code?

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP