BSMConnector/Ci Resolver - Problem while sending data to BSM, SocketTimeoutException
I'm facing a new issue with BSM Connector 9.23, sometimes it fails to send the samples to BSM and display error messages like that in error.log:
2017-04-12 10:37:50,333 [com.mercury.sitescope.integrations.bac.reporter.GuaranteedDataReporter[fr0-bsm-p01.eu.airbus.corp]] (ApacheHttpUtils.java:473) ERROR - problem while sending data to URL: http://fr0-bsm-p01.eu.airbus.corp/ext/mod_mdrv_wrap.dll?type=report_ss_samples error: URL: http://fr0-bsm-p01.eu.airbus.corp/ext/mod_mdrv_wrap.dll?type=report_ss_samples, host: fr0-bsm-p01.eu.airbus.corp, port: 80, UsingProxy: false, isHTTPS(SSL): false, org.apache.commons.httpclient.HttpRecoverableException: java.net.SocketTimeoutException: Read timed out, currentRetry: 0
Is there any setting to increase the "currentRetry" value or the timeout value? I tried to add _keepTryingForGoodStatus in master.config but that seems to have no effect.
Could you please open a new thread on OMi forum since BSM Connector is OMi engineer's expert.
Here is the link for your reference:
check BSM logs under <GW>\HPBSM\log\wde\wde.log or wde.all.log and make sure BSM is able to receive samples.
you didnt mention if your BSMC is sending events or metrics.
The BSM Connector has policies which send metrics and topology.
I checked the logs on the Gateway and all I see is lines like that:
2017-04-12 14:32:46,346 [UCMDB Query Result Notifications Service Notification Thread] (Index.java:126) INFO - Indexed 21169 CIs.
Sounds good to me.
Another information I forgot: after a restart of the Connector, it's able to send metrics to BSM for some times, but after few minutes it begins to fail and all attempts to send data fails.
I disabled many policies of the Connector to see if it's linked to the charge.
OK it seems the issue is linked with a specific policy. When not enabled, everything runs smoothly.
I'll investigate more and post the eventual solution when found.
Ok I've found out that it's related to the RTSM.
In fact, the policy sends metrics to BSM and also create related CIs with a topology script.
The problem is, when there are a lot of CIs of this type (currently we have 4000+ of them), this is when the problems start!
I've made tests from a new CI type with no instances. Starting with 1000 CIs created, no problem. But when it comes to more than 3000 of CI instances the http requests BSM Connector sends to BSM are increasingly longer, and eventually fail in timeout (and block all others policies).
So my new question is: is there a way to make the RTSM manage larger number if CI instances? Maybe by setting a better identification process for the CI?
Ok I know it's related to the CI Resolver mechanism of BSM.
I've tested with the JMX url http://<DPS>:29922/mbean?objectname=opr.ciresolver%3Aname%3DCiResolverMBean resolveHint function:
- When I search for a CI for which there are a few instances, the search is very fast.
- When I search for a CI for which there are 4000+ instances, the search makes a few seconds.
So I suspect when the BSM Connector does the "report_ss_samples" query, it sends many samples to be resolved and for each of them it takes a few seconds... which makes the http request very long and eventually timeout!
Any advice on how to enhance the CI Resolver performance when large CI instances are created please? I tried playgin with the TQL and cache settings but no luck for now.