HP UCMDB MySQL Probe Tuning Solution

Tl;dr: A properly configured HP UCMDB Data Flow Probe MySQL database increases reliability and performance to a previously unimagined new level. You need to tune the Probe properly and we share the method and queries to do so. Jobs are more reliable, results are faster, fewer errors and no corruption, clearing the probe cache is no longer a requirement of daily use. Integrations and Federation jobs are improved up to 366-fold. These changes have been made in a half dozen enterprises and have no negative impact on feature, function or licensing.


The free tuning package and the results of the tuning are available here.



Hi folks,


I am the CEO of Effectual, an SPM-only HP Partner. I've been working with the UCMDB technologies since Mercury first acquired it from Appilog. I’ve been thinking about and using UCMDB in effective ways for something close to a decade.

I founded Effectual to systematically approach and resolve the barriers and challenges every UCMDB project must overcome. Over the past three years, my engineers have often turned to this support forum for guidance. This is my first time posting since the old Mercury support forums were retired, but I expect this is one of the most important findings I could share in my career.


The MySQL version of the Probe (every version of UCMDB prior to 10.10) has shipped with a misconfigured set of variables that has caused a huge amount of pain for every UCMDB user working with anything larger than a tiny data set. In addition to exceeding the originally defined variable limits, the Probe required additional setup to really achieve the use case and technical requirements we have asked of it.


We are sharing fully with you; the root causes and the solutions. While we’re also going to the usual marketing efforts to let people know, this thread will go much deeper into “what and why” the problem exists so that Support and other users can understand how significantly this impacts their daily lives.


Bear with me, this will not be a short post.


  • Simplified Version of the Probe in Operation


We first need to understand that the Probe is not a black box. It is a series of applications, processes and functions that all have moving parts. These parts all behave differently based on what the data moving through the parts consist of and how they write to the database and are passed back to the Probe application and on to the UCMDB.


The MySQL version of the database uses MyISAM database engine, and the .MYD files and .MYI files comprise the data tables and the table indexes. Every single step of the Probe operation writes to MyISAM; either to record how far the job has progressed, or returning results from Discovery or normalizing CI results from an integration. These writes are done by the operating system and the operating system manages its own memory and threads outside of the MySQL database.


At regular intervals in the processing of any job, the MyISAM results are queried back into memory via the InnoDB features, which while they are “in memory”, the out of the box configuration also double writes the results to the IBDATA1 file and its logs. This is for features like ACID and to keep track of transactions that are mission critical in the event of a failover.


Once the DFP application has the results from the InnoDB process, it then handshakes it over to the UCMDB and processes the results; recon engine, history tables, etc. This cycle repeats over and over again in small chunks to prevent “overwhelming” the UCMDB until all the results are processed.


At each step through this process, the application updates the database to keep track where it is in the process – these results are also queried by the UCMDB UI as “Discovery Progress” and “Discovery Results”. Left alone this process runs smoothly and without attention from a human, just as designed, until the data size exceeds the key buffer cache size and the process grinds to a halt.



  • Key Buffer Cache



As it arrives out of the box, it isn’t uncommon to see results like a job that completed14 days before still chunking results from Probes through the DAL log to the UCMDB. It isn’t uncommon to have a job hung perpetually at a certain percent, or jobs failing, returning errors hours or days later after finally timing out.


This isn’t because the workload is heavy or the job failed on the jobs merit. It is because of a mechanism used by MySQL MyISAM engine called the Key Buffer Cache is running out of available memory blocks and forcing a horrible swap and write to/read from disk condition. There’s a reason the symptoms are so predictable – performance of the Probe breaks down once the cumulative database index usage exceeds this one limit.


The Key Buffer Cache or Key Cache is used to keep the .MYD and .MYI index blocks in memory. It also needs additional space to handle joins across those index blocks, such as for result processing of large many-relationship type jobs. A small dataset would never experience these problems, but once the .MYD files and the index used to operate on the database table exceeds the key buffer cache setting of 384MB, all bets are off and things begin to spiral out of control.


We have established that the Probe job sequence has many repeated steps that take place between the application, the actual database, the temporary database file, local log files and back to the application, then on to UCMDB. UCMDB can also demand information from the Probe when you ask for it in the UI. Lots and lots of throughput and repeated processes over time all banging head on into the limited key buffer cache.


This happens over and over and over again making the problem worse and the response time slow down exponentially. Once the Probe database size exceeds that cache, the only relief a customer has is to restart the servers, Clear Probe Cache Data, empty out those tables, and start the process over again.


Or you can properly size the key buffer cache and give MySQL adequate Free Memory – and all of these symptoms go away.



  • I/O Memory & Probe Data Table Sizes



You know the one thing not working hard at this point? The UCMDB.


By design the MySQL MyISAM engine makes use of the operating system to handle writes and reads, work buffers and blocks from memory to disk. The OS threads are used in every function, not just as a result of swapping or paging. Every healthy write and read uses OS threads and needs Free Physical Memory to work well.


The rule of thumb from MySQL DBA’s is that MyISAM engine’s key buffer cache should not be set more than 20% of the total Memory on the server, with the expectation that the rest of the memory will be consumed by the operating system and thread related.


So when tuning the Probe, we should have at least four times more memory per server than we have related to the Key Buffer Cache. This has worked out pretty well for us, as you’ll see in our last section on how to evaluate and tune the Probe.


We provide queries with our Probe Performance Tuning Guide so you can measure the size of your database tables and their corresponding indexes. You can also look at the .MYD and .MYI files sizes on the Probe and add them up, then add 15% on top.


What we’ve seen in all of our testing of various environments and use cases with Probes is that the related index to database size with MyISAM isn’t always equal. A Probe that has very simple jobs to run won’t have a big difference between data file and index size, but a Probe handling complex jobs or doing federation or integrations will have a significantly larger total index size. This is due to all the extra joins required when containment, composition, dependency or other related links are created.


In these cases the index of such a Probe is usually 1.5-2.0 times the size of the actual data in the tables. The more complicated the job, the more related data results are, the more likely you’ll have a lot of joins (I’m looking RIGHT at you Network and Application Discovery). We here at Effectual have been very careful with how our discovery and integration jobs are triggered and we use very specific TQL in every aspect of the work to keep the trigger input sizes to an ideal minimum.


Even with this careful and constant best practice, we couldn’t escape the reality of poor Probe performance and so we delicately separated out the job work load across many Probes and balanced schedules and job types. It was a pain. But it worked and many other folks do the same thing to get results.


After we had performed extensive testing, we went back to some of our Discovery farms and checked this hypothesis thoroughly. The largest Index we had across more than 30 different balanced probes was 481MB. So that particular probe was swapping, but the vast majority were under 400MB. We were able to correlate that the 481MB probe was historically much slower and less responsive to report results than the others.



  • IBDATA1, Temporary Tables, ACID, Doublewrites, Useless Overhead



Without going into query cache, table cache, open table cache, join buffer, read rnd buffer and all the other things we’ve tuned, we want to briefly mention the IBDATA1 and innodb_file_per_table settings that we want you to change.


As we mentioned in Part I and with the problems in Part II, the InnoDB functions eventually start paging. When there isn’t enough memory, or when work is sitting in memory waiting for more work to be completed, MySQL shuffles the active memory off to disk as a normal part of Windows function. So the longer something takes, the more likely disk writing is happening. Egad.


It’s a perfect storm that really adds up to long response times and poor observed performance. What isn’t intended, is that the IBDATA1 file which is used for all kinds of unintentional things never shrinks or auto truncates. If a Probe is really struggling, it also does lots of “temporary” writes and reads from the IBDATA1 file. We’ve seen these files exceed 4 and 5 GB in size. That’s a lot of bottlenecked activity, going into that one file and can have real impact on an already distressed system. We don’t see any intended activity with the IBDATA1 file, it just happens that MySQL as configured operates that way.


The single IBDATA1 file also lacks any real read/write checksum control and is essentially an unregulated temporary space that sticks around permanently. So it’s possible the MySQL database is corrupting part or all of the result set it sends back to the UCMDB when a Probe is having a difficult time.


Your IBDATA1 file may not be very large (and there’s no way to disable the use of IBDATA1), but at least we can offload the actual temp file work to actual temporary files. That’s where innodb_file_per_table comes in and our guide that walks you through how to enable it. By following our guide, you’re giving MySQL the ability to create actual temporary tables and log files as they are needed. On a modern VM Probe with fast SAN, you’ll see temp files get created, fill up with data and disappear in a second. It’s awesome, compared to the way the out of the box probe worked. The results of these temp files are most often joins, large complex queries and the results of the work is kept cached in memory with our configuration.


Since IBDATA1 can’t clean up after itself, the odds are that a lot of the results that may be read back into memory could be bad. Or they could get pushed out of the file entirely and not be found. Think on the types of error messages we see most common with Discovery and Integration jobs. Think about bulk failures, think about duplicates being detected. These are all originating not from the Discovery script, but from the stress condition of the Probe breaking down over time as the database size grows.


In short, the out of the box Probe was also configured more for a “financial website application” than the way we use it to process and manage Discovery results. The ACID compliant settings, the double writing, are essentially additional overhead where you might need a warm backup of the data for a MySQL database cluster. All of that has been disabled. We’ve tuned the I/O and tested it and mechanically changed the functions of the temporary tables, MyISAM buffers and InnoDB temporary table (and file) usage. The Probe use case does not require financial accountability for every transaction. We save time and resources by disabling this and it has an impact on performance.


There still remains a series of features from the Application that should be tuned, such as the actual SQL queries executed when a job starts and stops. Select Count(*) and Insert/Delete from where statement in the Probe database are not very efficient and these now result in the only long running “Slow” entries in the MySQL Slow Log. This will be a lot easier for HP R&D to see over time and hopefully address, such as replacing a 6 minute long delete statement with a drop or alter statement.



  • Tuning The Probe



We’ve misunderstood probe operation for years. So we’re also going to give provide you with our observations and recommendations for how you should be tuning and thinking about sizing your own Probes.


Adding machine resources without at least changing the key buffer cache, would not have shown any remarkable improvement to performance. So simply adding more RAM or CPU’s would just mean you page more, swap less. With the problems we’re describing, adding more resources without changing the tuning would result in faster paging and would decrease overall run time, but not alleviate any of the underlying pain and data corruption issues we’ve seen.


There is no one size fits all solution here because once you’re aware of the “ecosystem” of the Probe you’ll realize that your needs change as the maturity of the Discovery and Integration efforts grow. There’s certainly a “better” key buffer cache size for all, but even this will grow over time.


Effectual recommends emphatically that you build your Probes on virtual systems as resources can be added or removed as the Probe role or workload will change and I/O and disk are faster on modern SAN. It would be wasteful to allocate physical hardware and keep the Probe configuration the same. If you have a very large and dedicated physical environment for your Probes, you might want to consider a different architecture for the Probe databases, please contact us for design and guidance.


The Probe as an ecosystem needs to run a good number of concurrent operations; the actual discovery processes, the application processes, the MySQL processes, the operating system I/O processes. We are currently standardizing on 4 Core, 8GB RAM for all Windows Discovery Probes and 9GB and 12GB for the Integration Probes. As a result of the tuning we need far fewer Probes to accomplish the same amount of work, with none of the previous pain.


At the end of a week full of Discovery job execution and processing your Probe should have at least 1.5GB of Free Physical Memory unallocated. This prevents the OS and the above processes from swapping. Reducing swapping is good for your processor as well and you can find a good balance between the right size memory, just enough swapping and paging and good I/O. It takes patience and discipline and paying attention to the Probe MySQL databases with the queries we provide.


Every single operation the MySQL database performs requires I/O – although the Effectual tuning will remove a great deal of this excessive I/O, the fastest I/O storage should be used. Avoid traditional attached physical HDD for the database and probe operation.


In addition to these rules of thumb, you need to tune the Probe key_buffer_cache and innodb_buffer_pool_size based upon the actual and intended growth of your database tables and index size. Innodb_buffer_pool_size will need to be increased when we resolve some of the sipping that the UCMDB does from the Probe. When we tune the UCMDB to take more results, the Innodb_buffer_pool_size will begin to cause excess paging due to the rate at which data will be able to flow out of the Probe database. Not a big consideration today.


As previously mentioned MySQL DBA’s have a rule of thumb for MyISAM key_buffer_cache, that it should not exceed 20% of the available Physical Memory. Effectual’s test results show that key_buffer_cache should be at least the size of your total data files, if not 1.5x the size of your data file and then have room for growth. Once the total Index size exceeds the key buffer, you’re heading back towards poor performance. Setting the key cache higher does not mean the system will immediately use all that memory, just that there is more for the Index if needed.


So the general physical memory minimum recommendation we’re comfortable making is:


Total Data Table Sizes at Max Possible Size * 1.5 = key_buffer_cache  * 1.5 for thread, buffers, cache


Example: 2 GB total data table size * 1.5 = 3 * 1.5 = 4.5 GB for the MySQL Session, assuming you’re running 64 bit OS and 64 bit versions of the Probe and Database you can set your key buffer cache to whatever size you have available. The rest of the Free Memory on the box should go to the Java application (2GB) and the operating system (whatever is left).


Effectual also makes the recommendation of having at least 1.5 GB of free Physical Memory at all times, this 2GB data table size means you’re looking at a recommendation of 9 GB of Physical RAM required for ideal throughput and growth. Our large table Probes are 9GB and 12GB of RAM and will continue to grow as our CMS projects get larger.

If you’re not running DDMI you can turn off the HP Universal Discovery XML Enricher application which consumes 500-600MB of memory on the Probe. We disable this service on our Integration only probes. If you do run DDMI, you’ll need additional cores and 2GB of additional memory just to accommodate this service based on these findings.



  • Conclusion & Final Thoughts



As a former engineer and now a CEO, I’m a terrible writer and this is only a draft. I’m sure someone could do a better job than I of polishing up this language. I’m sure my writer/communications person will be doing that soon. However, I felt we couldn’t let this continue and so I’ve done my best with a dozen or so hours of work writing this out for the forum.


We will be releasing alternate configurations, including shared database tables in memory and super high performing MySQL instances alongside our results of PostgreSQL testing in the fall of 2014. We will not be keeping this as a proprietary secret, there’s just too much at stake. Our mission at Effectual is to see more HP customers succeed with UCMDB and CMSes in particular. This will help in every way.


We are currently performing tests against UCMDB and the Probe application. Delta syncs are fluid and take seconds. Larger jobs and Discovery results processing seem to have serious artificial “functionality” put into place to slow down the result processing.


To provide a sense of where we are heading next, by tuning the Probe and UCMDB task result behavior, we’ve been able to move 1,000,000 CI’s from the Probe to the UCMDB in under 20 seconds. This created 10 5MB dal log files in under a minute during test. UCMDB handled it just fine with the longest batch UCMDB update taking 636ms. We’re now looking at the history and Last Access time mechanics and the reconciliation engine chunk and merge in fuses.


Stay tuned to our Blog at EffectualSystems.com for more information as we release it. We do welcome discussion on this. We’re not going to stop until we can help demonstrate to the market how amazing a product UCMDB is. No other vendor supports ITIL centric value propositions like HP does and the new History and Integration capabilities of UCMDB are unmatched.


I look forward to discussing this and hearing your thoughts.




Erik Engstrom

  • I am genuinely surprised there was no response to this thread in the couple of weeks or so since it was posted.

    Should we even bother to share this information?

  • Hi Erik,


    I'm surprised too, it would have been of great value to me 6 months ago when i was still using 10.01 but now i'm on 10.10 so when you would post a likewise document for the databases uses in 10.10 probes i would be very enthousiastic!


    I think these kind of knowledge is a great plus for this forum.




  • Hi Ronald,


    Thanks for your response. How did your upgrade go? Have you noticed any different behavior with 10.10 discovery or integration performance?





  • Hi Erik,


    The upgrade itself went ok, a lot better than when we've upgraded from 9 to 10.

    But still we've had some issues and still have after the upgrade.

    The integration to hpsim did not work anymore, it still doesn't work via the database ip of hpsim, instead we are using now the webservice integration, it performs wel but still has issues.


    The discovery jobs itself don't seem to perform better nor worse than before. What surprises me is that the job Range IPs by ICMP runs very long on 1 probe that has the best hardware, but  we also suffer a lot of performance issues in the ucmdb self so that may be related to that.




  • Ronald,


    Did you perform an upgrade in place? If you still have a version 9 instance floating around, you could always integrate SIM to 9, then pipe it over via a Population to 10.10.

    If you're ever bored, would love to find out more about your challenges - your input could help quite a bit in how we prioritize and what we look at first.


  • We recently completed our upgrades of all our discovery and integration probes and provided additional resources to all HP software servers.

    Early trends show a 4x-5x improvement - this improvement is met by running out of resources - so the ceiling can go higher and may simply be: once tuned the system can scale with resources. What a nice change.

  • Hi Erik,

    Thanks for your interest, and as ucmdb admin i'm never bored...we have a lot of challenges!
    I will name some for you in order of importance to myself.

    Integration ucmdb/cm - service manager, we would very much like to use the rfc and incident widgets in the ucmdb browser and also
    the change integration between CM and HPSM to link real changes to authorization actions.
    i've had parts of them working a bit but it doesn't perform. What i miss from hp at this is a good set of documentation and examples
    what to functinally expect from the integration, how to configure it in various configurations and how to troubleshoot.

    Cpu usage on the ucmdb server...the usage on the 4 core system is mostly between 100% and 380% and we cannot find why or what it is doing.
    I tried troubleshooting it via the statistics logs/operations logs and for example i see Operations executed: 220728 in the last 15 minutes which seems a lot to me. If i tried to go in to the 10 worst operations i get stock soon since i don't understand what it means.
    And that brings me to another point..troubleshooting logs. There are a lot of logs on probe, server and for ever log you can set verious log levels.
    What is very annoyng is that these logfiles are unreadabe...which errors should i worry about/take action in..which not, how to find the errros that
    are important etc.

    In CM we have an issue..why are some changes keep being presented while there are no changes
    In modeling studio..how to use the instance based models..the docs are not clear to me to get done what i want.
    Integration between hpsm-ucmdb, i've build an integration to get assignemt groups and persons from hpsm but sometimes all groups and persons are double
    Integration between hpsm-ucmdb, i once made a job that sets the status of ci's in hpsm to a custom field in the ci's in ucmdb but it created a lot of duplicate ci's

    etc etc..


  • Ronald,


    Your issues cover some pretty commonly tread ground. I only partially joke when I ask, "Do you want cake or death?". Cake in this case being the longer term, more complicated answer. Death, well, being the same outcome over and over again.


    I would encourage you to read our multi-ucmdb architecture white paper and CMS business case. The performance problems you mention are fully resolvable and do not require you to "start over", only deploy new instances and populate and move data over. The other major issues you're running into with expected data quality are very challening to solve with the out of the box integrations and their basic approach.


    One instance of the UCMDB CANNOT HANDLE specific data and its relationships as it transforms from something that WAS to something that IS – it is all or nothing. You will always encounter these problems as the tool can only support one phase of the data set’s reconciliation, the “was” or the “is” and so one off integrations cause havoc because it tries to operate on a small piece of data in a large fabric of unchanged data. While CM has some short cuts it can prove very challening to incorporate with a single instance and WITHOUT a Configuration Management System approach.


    Reconciliation in UCMDB only has a single rule set, it can only normalize the “was” or the “is” and when it can’t it will automatically create a new object or the package will fail with the out of the box adapters with a single customer. But what if you have a UCMDB instance that handles the “was” data set and a UCMDB instance that handles the “is”? That works with the current capabilities of the tool and using populations and keeping data "synchronized" across ITSM, the CM instance, Discovery instance and UCMDB CMS master data set means one set of data can normalize even as it changes and has multiple states - authorized vs. actual.


    We go into a lot of detail in our approach and architecture and I'd be happy to help you further to solve the underlying issues that cause these individual pain points. Mainly the multi-ucmdb's help with performance AND provide the ability to handle multiple sets of reconciliation priorities. This with good TQL and integratons equals GOOD and CONSISTENT results in your data that allow you to then pursue CM, CMS, SACM, ITAM, or whatever integrated solutin you might want. Easiest way is to start with a conversation and a webex.

    For UCMDB log examination you'll want to use Baretail and focus on the error log, data access layer (dal), slow log and operation log. For Probe behavior and results you'll want to watch the probewrappergw log on the Probe itself.

    From the UCMDB Administrators Guide - these two are super important.


    CMDB Dal Log

    The log name is cmdb.dal.logog File Description

    Information about activity that occurred in the data access layer, the layer that

    works with the CMDB.

    Information Level

    Not available.

    Error Level

    • Connection pool errors
    • Database errors
    • Command execution errors
    • Debug Level l All DAL commands executed
    •  All SQL commands executed


    CMDB Operation Statistics Log

    The log name is cmdb.operation.statistics.logog File Description


    Statistics for all operations performed in the past 15 minutes including worst

    operation instances.

    Information Level

    Statistics per operation including operation class name, caller application,

    and customer ID.

    Default of 10 worst operation instances.

    Error Level

    Disables the statistics feature.

    Debug Level

    Not available.

    Basic Troubleshooting

    Check when there is a performance slowdown.


    You can freely change the logging level to INFO without system performance impacts (and reducing logs doesn't save much, unless you're on an old physical disk storage system). The logger and functions are loaded at boot and can be adjusted on the fly.


    As to understanding specific errorr messages - there's a ton of spam, but after you have a lot of experience with it, reading them gets easier. We can walk you through and explain how to read the log, trace threads and cross-reference certain problems in a webex.

    We have developed replacement adapters that help integrations and federation work smoothly. They are full replacements and instead of federating or pulling small pieces of feature-specific data from UCMDB or ITSM, the adapters actually merge the data sets and keep them synchronized. A change in one is updated in the other within a few seconds. That's part design and the outcome of our probe tunings.

    Knowledge is free at Effectual. But we have to ask that you meet us half way and make it easier to share. Forums are not an efficient way of closing the gap between what you know and what you don't. These are complex issues and it is very time consuming to support the many headed hydra of UCMDB problems one at a time. We'd much rather help explain and solve the underpinning problem that causes MOST of the problems in the entire suite of tools.


    And LASTLY, I have attached as a reference the UCMDB sizing and fuse changes that we use internally. This isn't a fix-all, but many of these fuses are so out of date that you need to understand the scope of change and type of fuse changes that we employ. Most of these we treat internally as "default" replacement settings we make when we install. We can do so because we know how to use the UCMDB without tipping it over. If you adjust some of these settings, you will still see limits and problems with performance - if your TQL results are too large, the TQL is inefficient or you somehow manage to do something you don't understand that chews up the UCMDB writer resources.

    Sounds like you already have some of those problems, so I'll offer agian to help with a webex. This coming Wednesday at 8AM PDT we have a session scheduled to provide exactly this kind of support. We do twice monthly (second and fourth Wednesday of each month) open support calls to help people tune their Probes, answer questions, find root causes to errors. You can join: 1-619-550-0008, 196-657-325 (https://global.gotomeeting.com/join/196657325).

    Or reach out to me at my email or PM for a different time that works.



  • Hi Erik,


    I still need to publicly thank you for your effort. We've had a succesful websession and adjusted only 1 fuse wich has had a enourmous impact on the system in a positive way.


    I'm exited about your knowledge and looking forward to more cooperative work to get things better.




  • Hi Erik,


    I still need to publicly thank you for your effort. We've had a succesful websession and adjusted only 1 fuse wich has had a enourmous impact on the system in a positive way.


    I'm exited about your knowledge and looking forward to more cooperative work to get things better.