Another Node Set Size Discussion


Novell Identity Manager uses a number of data types internally, when processing data. One very interesting data type is the nodeset.

In a couple of previous articles I have been attempting to try to come to grips with how much memory a nodeset takes up. I cannot find this written down anywhere else, so I am trying to work it out empirically.

After writing those articles I was still not satisfied, since I was not getting the kind of data I really wanted.

I was able to reuse the same circumstances, and look at where I was getting my numbers from, and try to narrow the focus down even further.

In the second article (More thoughts on the size of a node set in Identity Manager) I think my mistake was how I counted nodes. I only counted the instance nodes, that the Query returned, using an XPATH function count() and specifically in this case, count($VARNAME). However, after thinking about it, I realized that count($VARNAME) function counted only the top level nodes. (Which happen to be nodes).

For my next iteration of this experiment, I added a count($VARNAME/attr) to get a count of the attribute nodes. Then I added a count($VARNAME/attr/value) since some of the values (Like DirXML-Associations) was usually (if not always) multi valued. Though in this case, it turns out all the nodes were single valued.

I think I could have done something like count($VARNAME/attr[@attr-name="DirXML-Associations"]) to only count the number of nodes that are for the attribute DirXML-Associations. Lots of fun can be had in XPATH! For more information on using XPATH in fun ways, look at some of the other articles on using XPATH in Novell Identity Manager.

This time around the numbers I got were:

Total number of nodes: 73,835 which was made up of 9327 <instance> nodes, 32254 <attr> nodes, and 32254 <value> nodes.

The memory used was:

Start: 23,789,568 bytes

Finish: 201,392,128 bytes

For a total of 177,602,560 bytes change.

Now for some simple math fun... That would leave us 19,041.77 bytes per <instance> node.

Then 5506.37 bytes per <attr> node and then 5506.37 per <value> node and finally, 2405.40 bytes per all nodes (<instance>, <attr>, and <value>).

I think it is that last number that has the most meaning. That is about 2.4K for each actual node, because this time, we divide the memory we think we used by the total number of nodes, as opposed to before when it was the total number of <instance> nodes, which differed depending on the number of <attr> and <value> nodes it contained.

Over all this is still useful information. I think it is pretty safe to consider 10K a node a reasonable amount to plan for, and it is unlikley you will run into under-estimation issues that way. (Unless of course you have funny attributes that store a LOT of data in the attribute. One example might be the photo attribute that stores a JPG image in base64 encoded in the attribute. Clearly that is an exceptional case and you would be planning for that anyway.)


How To-Best Practice
Comment List