Word Xml Translation

Image XML Translator

Owner: alanlynn

Goal

After best practice violations have been detected, it's important to actually help the user to fix the violations. This will involve highlighting relevant information as well as manipulating the underlying document itself to fix the violation. We currently are able to detect violations through parsing the document's XML. However, this is a read-only string, so we can't change it. In addition, XPath queries give no sense of position within the xml, they simply return nodes that match the given expression. Thus, we can't determine if an image without alternative text is the 4th image in the document or the last image in the document. Both of these present problems in working only with the xml of the document. Thus, we need to make use of the object model, which allows us to overcome both of these limitations. There is, however, one task that must be completed to bridge the xml and object model. We must first mark up the xml with our own metadata that allows us to make a correlation between the xml and the object model. For instance, we should go through all images in the xml and mark them as the first image, second image, etc. This way, when we attempt to detect violations, if we find a violation with a given image, we can determine which image it is in the document by the metadata we add to the xml.

Implementation

The function will take an xml representation of a document, parse through it, and mark up all images with an attribute which denotes their position (i.e. shapes[i]) in the object model. This can easily be done by marking images sequentially with an incrementing counter.

Input

We will pass a XMLDocument and an XMLNamespaceManager to this function. Alternatively, an string representing the xml could be passed. Whichever works easiest for you.

Output

A transformed XMLDocument or string representing the newly marked up document.

Notes

There are two main structures which images can be lumped into in the Word object model, shapes and inlineshapes. Each type of image can be lumped into one of these two structures. Nick found which go into which, so the corresponding xml should be marked up accordingly.

  1. Inline Shapes
    • Images
    • Clip Art
    • SmartArt
    • Chart
    • WordArt
  2. Shapes
    • Shapes
    • Textboxes
  3. OMath
    • Equations

I believe images, clip-art, smartart, charts, and wordart are contained within a <w:drawing> tag, however you should check these. Shapes are contained within a <v:shape> tag.
For example, if there are two images in the document:

<document>
    <w:drawing name="bumble.jpg" />
    <w:drawing name="bar.jpg" />
</document>

Since these are both inlineshapes, the xml should look as follows after the algorithm runs:

<document>
    <w:drawing name="bumble.jpg" inlineCounter="0" />
    <w:drawing name="bar.jpg" inlineCounter="1" />
</document>
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License