OMI gateways stop processing downstream events
At a bit of a loss at to why this has started happening but our OMi solution has completely died 4 times in the last month but prior to that it's had no issue since it was installed almost a year ago.
We are running OMi v10.60 on Windows 2012 R2 servers.
We have 2 Gateway servers and 1 data processing (SQL servers sits on another VM)
OMi itself is nothing more than a presentation layer, we have feeds from 2 OML and an OMW MOM server sending their events into it for glass watching and API processing.
Looking at the gateway servers I can see the following in the opr-gateway log:-
ERROR msghandler.processcall(176) - JMSException when trying to submit event via JMS bus.: Failed to send events -- AMQ119014: Timed out after waiting 30,000ms for response when sending packet 43 -- AMQ199014: Timed out after waiting 30,000ms for response when sending packet 43
ERROR msghander.processcall(190) - Read in new message Exception. Call: PUT /com.hp.ov.opc.msgr/rpc HTTP/1.1cache-control: no-cache
Then there are a series of SOAP messages followd by a JMSexception failure. Failed to send OM event updates --transaction was rolled back.,
Then all the downstream connection get severed:-
EventSyncThread.destroy(210) - Stopping eventsyncthread....
I can't seem to get it working without doing a full restart of the solution which takes a long time.
And even then it's not guaranteed to work.
Has anyone seen this before? I've got a support ticket open but at the moment the whole solution is down and I don't believe there is anyone available for weekends......
Let me know answers to below queries,
1. How much events come to OMI per second or every 5 minutes, please install OMI contrib pack and check this.
2. Are there any message storm coming from few monitored servers often, can be checked by filtering OMI events using keyword "Storm"
We also had this issue and we saw there were more than 5000 events coming from certain servers and we limited and it is fine with our environment.
We managed to correct this issue by stopping the Gateways, then the DP server and then removing everything inside the <OMi Home>\bus area of the DP server using the following commands:-
opr-support-utils.bat -bus -getserver
opr-support-utils.bat -bus -resetserver
Then a restart of ovc on the DP, then the Gateways followed by a start-up of the environment.
This brought everything back on-line alsmot immediately.
I have to say at this point OMi is one of the worst products I have ever used and I've used plenty of monitoring tools.
Thanks for the replies!