Python生成PASCAL VOC格式的xml标注文件-xml文件

PASCAL VOC数据集的标注文件是xml格式的。对于py-faster-rcnn 或者ssd,通常以下示例的字段是合适的:


  1. <annotation>
  2. <folder>GTSDB</folder>
  3. <filename>000001.webp</filename>
  4. <size>
  5. <width>500</width>
  6. <height>375</height>
  7. <depth>3</depth>
  8. </size>
  9. <object>
  10. <name>mouse</name>
  11. <difficult>0</difficult>
  12. <bndbox>
  13. <xmin>99</xmin>
  14. <ymin>358</ymin>
  15. <xmax>135</xmax>
  16. <ymax>375</ymax>
  17. </bndbox>
  18. </object>
  19. </annotation>

怎样从csv或者txt格式的文件,读取bbox信息,生成xml格式的annotation文件呢?直接逐行写文件肯定可以,但是以后改起来并不太方便,\t和空格的替换也不太方便。


  1. sudo pip install lxml
  2. #from xml.etree.ElementTree import Element, SubElement, tostring
  3. from lxml.etree import Element, SubElement, tostring
  4. import pprint
  5. from xml.dom.minidom import parseString
  6. node_root = Element('annotation')
  7. node_folder = SubElement(node_root, 'folder')
  8. node_folder.text = 'GTSDB'
  9. node_filename = SubElement(node_root, 'filename')
  10. node_filename.text = '000001.webp'
  11. node_size = SubElement(node_root, 'size')
  12. node_width = SubElement(node_size, 'width')
  13. node_width.text = '500'
  14. node_height = SubElement(node_size, 'height')
  15. node_height.text = '375'
  16. node_depth = SubElement(node_size, 'depth')
  17. node_depth.text = '3'
  18. node_object = SubElement(node_root, 'object')
  19. node_name = SubElement(node_object, 'name')
  20. node_name.text = 'mouse'
  21. node_difficult = SubElement(node_object, 'difficult')
  22. node_difficult.text = '0'
  23. node_bndbox = SubElement(node_object, 'bndbox')
  24. node_xmin = SubElement(node_bndbox, 'xmin')
  25. node_xmin.text = '99'
  26. node_ymin = SubElement(node_bndbox, 'ymin')
  27. node_ymin.text = '358'
  28. node_xmax = SubElement(node_bndbox, 'xmax')
  29. node_xmax.text = '135'
  30. node_ymax = SubElement(node_bndbox, 'ymax')
  31. node_ymax.text = '375'
  32. xml = tostring(node_root, pretty_print=True) #格式化显示,该换行的换行
  33. dom = parseString(xml)
  34. print xml

Python生成PASCAL VOC格式的xml标注文件

推荐阅读