Merging multiple attribute instances in a DOM document - XPath and Node-sets

9 months ago

Stumbling into a Rabbit Hole

This article is a result of a rabbit hole I went down a few days ago looking at a situation where a specific attribute had multiple instances/references within a single DOM document. The first occurrence resulted in a Rule acting on the wrong values and should have been working on the values in the second instance. This led to an obscure failure that had the customer puzzled as to why it worked sometimes and not other times.

An example of the type of (simplified) DOM document that I am referring to can be shown in the following where there are three instances of the Description attribute:

<input> <modify class-name="user" qualified-src-dn="O=Vault\OU=Users\CN=TestUser" src-dn="\LAB1-VAULT2\Vault\Users\TestUser" src-entry-id="78855" timestamp="1590526996#1"> <modify-attr attr-name="Description"> <add-value> <value type="string">Line 1</value> </add-value> <add-value> <value type="string">Line 5</value> </add-value> </modify-attr> <modify-attr attr-name="Description"> <remove-all-values/> <add-value> <value type="string">Line 2</value> </add-value> <add-value> <value type="string">Line 3</value> </add-value> </modify-attr> <modify-attr attr-name="Description"> <remove-value> <value type="string">Line 1</value> </remove-value> <remove-value> <value type="string">Line 2</value> </remove-value> <remove-all-values/> <add-value> <value type="string">Line 3</value> </add-value> <add-value> <value type="string">Line 4</value> </add-value> </modify-attr> </modify> </input>

In a complex (regular?) scenario, those three instances would exist mixed up with many other attributes within a single DOM document. This can be challenging when reviewing to make sure all the instances of a given attribute are found and tracked the final outcome when reading through a driver trace. If a developer made an assumption that only a single instance exists and reacted on that, it could lead to some unexpected results.

My Perspective on Why This Occurs

It should be noted that I believe this type of situation is caused by what I consider is poor coding practices. If changes are being made to an attribute, the attribute instance should be modified in place or replaced completely with new values. The goal being to have a single instance of each attribute being carried in a DOM document. It is incumbent on the developer to do any clean up work on an existing attribute when adding another instance to the DOM document. This will avoid having multiple instances of a specific attribute in a single DOM document. (I suspect there may well be specific cases where multiple instances are needed - I have just not run into such a situation yet. Someone in the community can correct me here.)

I know it can be challenging, when projects are rushed or behind, but I firmly believe that messy code will return to 'byte' you later if not managed initially. Some may ask why this clean up matters when in most cases the changes are processed sequentially and the results will come out the way the developer wanted them to. I believe that is a false attitude and belief backed up by having worked through troubleshooting a number of occasions where messy code and multiple instances of the same attribute in a DOM document existed. I encourage everyone to practice clean and safe coding.

The Rabbit Hole gets Deeper

Though this situation may not often be encountered, my mind went a wandering as I wanted to figure out how I could handle this scenario quickly. As I mapped out a basic solution and put together an initial solution, I started to realize that additional variations could also exist and they had to be addressed as well. That led me to revise the solution and incorporate further options.

For example; I started off by processing every instance and resetting the add or remove values lists if I encountered a <remove-all-values/> tag. Since this tag would negate anything calculated to that point. Processing the prior instances obviously became a small waste of time. Further to that, I then realized there could be rare situations where that tag occurred multiple times across various instances. So one more case to be handled. That led me to wonder about identifying the last occurrence of that tag and simply starting my processing from that point. My little Rabbit Hole continued to grow deeper as I worked through the variations.

When I finally stopped, I thought I had covered all the possible cases and then realized there is one case that I have missed. More on that later as an exercise for others to consider.

Though this solution may be like closing the barn door to hide the mess, I see it as a reference on how to manage similar coding challenges. This example could be applied in whole or in part in other ways and areas to work in working with node-sets more effectively.

Reference Credits

I want to call attention to a great article that helped me in working through this solution. The Manipulating node-sets in IDM via Set Operations by covers some great concepts on working with node-sets that are easily generated in a driver policy. Being able to determine the Union, Intersection and Differences between node-sets can quickly speed up and simply your policy in a variety of ways. Without these XPath options you will require multiple for each loops and many other string comparison tests. I was generating far more complex and time consuming code, before I fully understood the concepts covered by Alex. Working through this solution finally helped me in growing the understanding of these options to something far more useful than I have been applying before. Hopefully, this article is a helpful addendum to Alex's article that expands your understanding of these concepts with additional examples of their use.

As second call out is to and his Using XPATH to Get the Position of a Node in a Node Set article. Ironically, I found the stack overflow article he references first, while trying to figure out how to get the position of the last occurrence of a <remove-all-values/> tag in my test DOM documents. Between the two sources I was able to resolve that question nicely. (Look for the "last-remove-all" local variable.)

Finally, I want to say thank you to all the folks who have contributed to this community over the years. Lots of examples shared to learn from and hopefully more in the community will share their tips and tricks as well.

High Level Logic:

Process through a single DOM document to read in all instances of a specific attribute and merge them into a single instance that provides the same effective changes. In this example I use the attribute "Description", though any could be used. This could could also be modified and expanded to apply to a list of attributes rather than just a single one.

The processing has to be done in the order of the attribute instances within the DOM document to maintain the accuracy of the changes currently embedded in the DOM document.

If there is a <remove-all-values/> tag, then any instance of the target attribute prior to that occurrence can be ignored as this tag will override all prior value changes. So start processing at the occurrence of the last <remove-all-values/> tag. If this tag exists, set a Remove All Flag so that we make sure to add this tag in the final merged instance we will generate.

For each instance gather up all <remove-value> values and <add-value> values and then process each with the following conditions:

  • If the <remove-value> Value does not exist in the Remove Values node-set or in the Add Values node-set - then it can be added to the Remove Values node-set.
  • If the <remove-value> Value exists in the Add Values list, remove that value from the the Add Values node-set. (No need to add it to the Remove Values node-set)
  • If the <remove-value> Value is in the Remove Values node-set, there is no need to add it again.
  • If the <add-value> Value does not exist in the Add Values node-set, add it to the Add Values node-set.
  • If the <add-value> Value is in the Add Values node-set, there is no need to add it again.

Once the Remove All Flag, Remove Values node-set and Add Values node-set are generated, strip all existing instances of the attribute from the DOM document.

If any of the Remove All Flag, Remove Values node-set, or Add Values node-set are valued, create a new instance of the attribute in the DOM document with the following:

  • The <remove-all-values/> tag if the Remove All Flag is set
  • The Remove Values node-set in a <remove-value> tag if Remove Values node-set is not empty and the Remove All Flag is not set. (No need to add values to be removed if the Flag is set and all values are being removed.)
  • The Add Values node-set in a <add-value> tag if Add Values node-set is not empty

Note: The Cone by XPATH ("do-clone-xpath") is a great option to add a node-set that includes the <value> tags, which it will if you have generated the node-set at that level as shown in this example. I had to dig around and find a reference for this as I was not seeing how to do this successfully. Likely another area that my learning is challenged until I actually use the option a few times.

So with that logic, some XPath nodest Union, Intersection and Difference testing, here is the final Rule to merge multiple instances of an attribute to a single instance within a DOM document. I have used the trace message at level=20 trick, to provide inline comments on the code.

<rule> <description>Merge MV Attr Instances using Node-sets</description> <comment xml:space="preserve">Details in article reference:</comment> <comment name="author" xml:space="preserve">dstagg</comment> <comment name="version" xml:space="preserve">0.1</comment> <comment name="lastchanged" xml:space="preserve">2020-06-02</comment> <conditions> <and> <if-operation mode="nocase" op="equal">modify</if-operation> <if-class-name mode="nocase" op="equal">User</if-class-name> </and> </conditions> <actions> <do-set-local-variable name="lns-RemAttr" scope="policy"> <arg-node-set/> </do-set-local-variable> <do-set-local-variable name="lns-AddAttr" scope="policy"> <arg-node-set/> </do-set-local-variable> <do-trace-message level="20"> <arg-string> <token-text xml:space="preserve">Identify position of the last "remove-all-values", if there is one. XPath searchs for the last occurrence of the "remove-all-values" entry and then calculates the current position by counting back using the "../preceding-siblings::*" process. That result is stored in the local variable.</token-text> </arg-string> </do-trace-message> <do-set-local-variable name="last-remove-all" scope="policy"> <arg-string> <token-xpath expression='count(./modify-attr[@attr-name="Description"]/remove-all-values[last()]/../preceding-sibling::*)'/> </arg-string> </do-set-local-variable> <do-trace-message level="20"> <arg-string> <token-text xml:space="preserve">Iterate through the attributes, pulling "remove-all-values"; "remove-value" and "add-value" entries. Starting point is the last "remove-all-values" entry. This ignores any preceding attribute instances as they are effectively wiped out by the "remove-all-values".</token-text> </arg-string> </do-trace-message> <do-for-each> <arg-node-set> <token-xpath expression='./modify-attr[@attr-name="Description"][position()>$last-remove-all]'/> </arg-node-set> <arg-actions> <do-trace-message level="20"> <arg-string> <token-text xml:space="preserve">If we are starting with a "remove-all-values" flag that for later inclusion in the final merged attribute.</token-text> </arg-string> </do-trace-message> <do-if> <arg-conditions> <and> <if-xpath op="true">$current-node/remove-all-values</if-xpath> </and> </arg-conditions> <arg-actions> <do-set-local-variable name="lv-RemAllFlag" scope="policy"> <arg-string> <token-text xml:space="preserve">true</token-text> </arg-string> </do-set-local-variable> <do-set-local-variable name="lns-RemAttr" scope="policy"> <arg-node-set/> </do-set-local-variable> <do-set-local-variable name="lns-AddAttr" scope="policy"> <arg-node-set/> </do-set-local-variable> </arg-actions> <arg-actions/> </do-if> <do-trace-message level="20"> <arg-string> <token-text xml:space="preserve">Extract each "remove-value" entry and check that it is not already in the list of add or remove values. If it is not in either node-set list, add it to the "remove-value" node-set list. (double node-set test in one line) If it is in the "add-value" node-set list, remove the value from that node-set list.</token-text> </arg-string> </do-trace-message> <do-for-each> <arg-node-set> <token-xpath expression="$current-node/remove-value/value"/> </arg-node-set> <arg-actions> <do-set-local-variable name="lns-RemAttr" scope="policy"> <arg-node-set> <token-local-variable name="lns-RemAttr"/> <token-xpath expression="$current-node[not(. = $lns-AddAttr)][not(. = $lns-RemAttr)]"/> </arg-node-set> </do-set-local-variable> <do-set-local-variable name="lns-AddAttr" scope="policy"> <arg-node-set> <token-xpath expression="$lns-AddAttr[not(. = $current-node)]"/> </arg-node-set> </do-set-local-variable> </arg-actions> </do-for-each> <do-trace-message level="20"> <arg-string> <token-text xml:space="preserve">Extract each "add-value" entry and check that it is not already in the list of add values. If it is not in node-set list, add it to the "add-value" node-set list.</token-text> </arg-string> </do-trace-message> <do-for-each> <arg-node-set> <token-xpath expression="$current-node/add-value/value"/> </arg-node-set> <arg-actions> <do-set-local-variable name="lns-AddAttr" scope="policy"> <arg-node-set> <token-local-variable name="lns-AddAttr"/> <token-xpath expression="$current-node[not(. = $lns-AddAttr)]"/> </arg-node-set> </do-set-local-variable> </arg-actions> </do-for-each> </arg-actions> </do-for-each> <do-trace-message level="20"> <arg-string> <token-text xml:space="preserve">Now that we have a) flag for "remove-all-values", "remove-value" node-set list and "add-value" node-set list, remove all existing instances of the attribute from the DOM document.</token-text> </arg-string> </do-trace-message> <do-strip-op-attr name="Description"/> <do-trace-message level="20"> <arg-string> <token-text xml:space="preserve">Now build the merged attribute assignment from the flag and node-set lists. If the remove all flag is set, we start with a "remove-all-values" tag by using the set destination attribute which includes that tag. Otherwise we use the add destination attribute, only if there is a non-empty node-set for either the remove values or add values node-set lists. If there is no flag set and both remove and add value node-sets are empty, no new attribute will be added to the DOM document.</token-text> </arg-string> </do-trace-message> <do-if> <arg-conditions> <and> <if-local-variable mode="nocase" name="lv-RemAllFlag" op="equal">true</if-local-variable> </and> </arg-conditions> <arg-actions> <do-set-dest-attr-value name="Description"> <arg-value type="string"/> </do-set-dest-attr-value> <do-strip-xpath expression='./modify-attr[@attr-name="Description"]/add-value'/> </arg-actions> <arg-actions> <do-if> <arg-conditions> <or> <if-local-variable mode="regex" name="lns-RemAttr" op="equal">. </if-local-variable> <if-local-variable mode="regex" name="lns-AddAttr" op="equal">. </if-local-variable> </or> </arg-conditions> <arg-actions> <do-add-dest-attr-value name="Description"> <arg-value type="string"/> </do-add-dest-attr-value> <do-strip-xpath expression='./modify-attr[@attr-name="Description"]/add-value'/> </arg-actions> <arg-actions/> </do-if> </arg-actions> </do-if> <do-trace-message level="20"> <arg-string> <token-text xml:space="preserve">Add the "remove-value" tag if the node-set is not empty and then populate it from the node-set using the clone by XPath. Since the node-set was generated with the "value" tags included, this creates the correct format in the DOM document.</token-text> </arg-string> </do-trace-message> <do-if> <arg-conditions> <and> <if-local-variable mode="nocase" name="lv-RemAllFlag" op="not-equal">true</if-local-variable> <if-local-variable mode="regex" name="lns-RemAttr" op="equal">. </if-local-variable> </and> </arg-conditions> <arg-actions> <do-append-xml-element expression='./modify-attr[@attr-name="Description"][last()]' name="remove-value"/> <do-clone-xpath dest-expression='./modify-attr[@attr-name="Description"][last()]/remove-value[last()]' src-expression="$lns-RemAttr"/> </arg-actions> <arg-actions/> </do-if> <do-trace-message level="20"> <arg-string> <token-text xml:space="preserve">Add the "add-value" tag if the node-set is not empty and then populate it from the node-set using the clone by XPath. Since the node-set was generated with the "value" tags included, this creates the correct format in the DOM document.</token-text> </arg-string> </do-trace-message> <do-if> <arg-conditions> <and> <if-local-variable mode="regex" name="lns-AddAttr" op="equal">. </if-local-variable> </and> </arg-conditions> <arg-actions> <do-append-xml-element expression='./modify-attr[@attr-name="Description"][last()]' name="add-value"/> <do-clone-xpath dest-expression='./modify-attr[@attr-name="Description"][last()]/add-value[last()]' src-expression="$lns-AddAttr"/> </arg-actions> <arg-actions/> </do-if> </actions> </rule>

So I hope my journey down this Rabbit Hole has been helpful with some ideas and thoughts on how to work with node-sets and XPath Union, Intersection and Difference options as well as a Position example in there.

I am sure that you can find some other ways to apply some of this logic, and if you thing the logic is flawed, feel free to comment and add your thoughts.

For those desiring extra credit, there is one test case I have not included in this solution. If the <add-value> is already in the Remove Values node-set, the final output will have a remove of that value followed by an add of that value. What has to be added to remove the add value from the remove value node-set?




Support Tip
How To-Best Practice
Comment List
  • - I should have checked the standard myself. I will update my references to follow also. I find it amusing how we can each look at the same term and see it different ways. Thanks for your feedback.

  • Regarding searchability and preferred spelling.  I was under the impression the correct/official way to write this was with a hyphen. In other words node-set is correct, rather than nodeset or node set.

    At least That is how it is written in the w3c documentation. Have now updated my article to consistently reflect the w3c format.

    Glad my article was of help.

  • I like reading these since I always learn something.  I never thought about this aspect, using position() with less than or greater than tests.  So early on you did:


    I do not think you need ./ at the beginning though.


Related Discussions