Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..
236 views

Sudden problems with Node and Agent Topology Discovery

Jump to solution

Hi

In my customer's OBM 2019.05 on Windows env. (with an active & a standby DPS and two redundant GWs behind an LB), we have had problems with the normal Node- and Agent-discovery function since the middle of last week. 

After enabling LOG_LEVEL 10 on the OvSvcDiscServer server on the GWs and troubleshooting by restarting opcmsga on a couple of agents (we consequently use OA 12.11), I can see from the OvSvcDiscServer.log one one of the GWs that node data from the node in question is indeed received immediately. I can also see that an XML file called OvSDXmlBuffer<nnnn>.xml is created on the same GW, in the  %OvDataDir%\shared\server\datafiles\svcdisc directory. But everthing seems to kind of stop there. The opr-svcdiscserver.log does not show any traces of the new discovery info. form the node, and nothing whatsoever is written to the opr-svcdiscserver-citrace.log file.

Furthermore, the above mentioned directory contains thousands of such OvSDXmlBuffer<nnn>.xml files, most of them are from the last 24 hours.

Last night, all of a sudden, a bunch of new nodes were discovered properly. I can see this in the normal OBM Console under Monitored Nodes (they have detailed OA- and OS-version info. etc). I can even see from the opr-svcdiscserver-citrace.log on one of the GWs at what time last night that they were added. So during approx. 15-20 minutes last night the discovery suddenly worked fine. Since then, however, new nodes are not discovered anymore.

Apparently there seems to be some performance issue somewhere. I just do not know where... All "OMi Server Health" graphs indicate that the Java Heap Utilization, CPU etc. are at perfectly normal levels.

Questions:
- Do you have any tips regarding how to troubleshoot this further?
- Could I remove all current files from the %OvDataDir%\shared\server\datafiles\svcdisc directory (after temporarily stopping the wde service)? I realize that I will loose buffered data, but since the OAs report new data with the ASSD-function every 24 hours, I reckon that this should not be any problem?

I would be most grateful for any help regarding this!

BR,
Frank Mortensen
Managon AB

Labels (1)
0 Likes
1 Solution

Accepted Solutions
Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..

Hi,

I eventually found the cause of our problems, which was even the cause of the large number of those Bios update messages in the log, I reckon. It turned out that we have a number of Node CIs in RTSM (25) with an incredibly high number of contained interfaces (from a couple of hundred to approx. 2000(!)). The excessive number of interfaces were caused by temporary interfaces being created locally on the nodes over time (this is an assumption) and us not having enabled aging for Interface CIs (this is a fact).

Yesterday I manually deleted those interface CIs in RTSM. Next week I will configure the aging mechanism even for CIT Interface, to make sure that we do not experience this problem again.

Thanks for all efforts!

Cheers,
Frank

View solution in original post

0 Likes
6 Replies
Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..

Hello Frank,

I was in a  situation at a customer and needed to clean %OvDataDir%\shared\server\datafiles\svcdis .After 24 hours everything in RTSM was updated with correct discovered data.

So should be no problem. but no warranty for this 😉

 

 

 

Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..

Thanks, Andreas!

I actually tried it this morning, after verifying in a test env. that it did not do any damage...

Just after starting the wde on the two GWs again, I was very happy to observe that many new nodes showed up. I also noticed that temp files were created and shortly after removed form that svcdisc directory. The logs OvSvcDiscovery.log, opr-svcdiscserver.log and opr-svcdiscserver-citrace.log even indicated that nodes were in fact discovered and added. 5 minutes later, however, I noticed from the -citrace.log file on GW2 that no new CIs were created from that side, and the number of files in that svcdisc directory started to grow again 😞 15-20 minutes later the exact same happened on GW1.

I still see much activity in the OvSvcDiscServer.log file on both GWs, so the discovery and ASSD per se seems to work. But something seems to prevent the Mapping Service from processing the entries. 

So the problem i.e. returned after a very short while, and new nodes are not discovered anymore 😞

I can see a lot of activity in the opr-svcdiscserver.log still, on both GWs, but it only seems to contain a bunch of the entries like this:

2020-04-22 11:19:16,780 [Thread-25] INFO  CMDBAdapter.updateBiosUuidProperty(373) - CMDBAdapter :: updateBiosUuidProperty() :: Trying to update Bios_Uuid information. 
2020-04-22 11:19:19,735 [Thread-25] INFO  CMDBAdapter.updateBiosUuidProperty(393) - CMDBAdapter :: updateBiosUuidProperty() :: Bios_Uuid information Updated. 
2020-04-22 11:19:19,735 [Thread-25] INFO  CMDBAdapter.updateBiosUuidProperty(373) - CMDBAdapter :: updateBiosUuidProperty() :: Trying to update Bios_Uuid information. 

These entries arrive in the log continuously every 1-2 seconds. I have tried to change the loglevel to DEBUG (in the opr-svcdiscserver.properties file), but I still only see INFO-entries in the log, i.e. no DEBUG-entries. Do I need to restart wde for this log level change to occur?

I am getting rather desperate, and would be most grateful for any advice!

Cheers,
Frank

 

0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

Hello Frank,

Try the following procedure to get more information about the service discovery issue.

Using the opr-tracegui command on the GW to enable toposync extra logging:

Cd <HPBSM>\opr\support

opr-tracegui –enableAreaLogging GWS TopoSync

*************************************************

opr-support-utils –stop wde

Delete the OVSDInstances file and ovsdinstbuffer file.
del "%OvDataDir%\shared\server\datafiles\svcdisc" /F /Q /S

opr-support-utils –start wde

***************************************************

On the agent:

ovagtrep -clearall

run the svcdisc policy CLI : ovagtrep -run {name_of_discovery_policy}

ovagtrep -publish
(to send discovery results to the GW serveR)

After the discovery policy the discovery results (XML) are in %ovdatadir%tmp\agtrep

Please run on the GWS after the "ovagtrep -publish" has run on the CO node:

opr-tracegui –disableAreaLogging GWS TopoSync

opr-tracegui –collectAreaLogging GWS TopoSync

on the GW ..:\HPBSM\log\wde\

investigate these logs:

opr-svcdiscserver.log
opr-svcdiscserver-citrace.log

hope it helps,

Patrick

Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..

Thanks for the tip, Patrick!

I will try tracegui when returning to the customer tomorrow.

In our case it is just a matter of discovering the nodes and the agents (OA) by means of the ASSD data, so your steps regarding service discovery do not seem to apply here.

Cheers,
Frank 

0 Likes
Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..

Hi,

I have now tried the opr-tracegui way of debugging, and I found out two things;

One is that it unfortunately seems to do the exact same thing that I have done manually, i.e. setting the log level to DEBUG etc. in the appropriate file and then collect the files.

In theory, that is... Becuase secondly I found out that when I run  the command with the -enableAreaLogging (and -even -disableAreaLogging) I only get Java errror messages. This happens both in our prod env. and test env., so it's consequent. Here's the output:

<OBMDir_here>\opr\support>opr-tracegui -enableAreaLogging GWS Toposync
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang/text/StrLookup
at com.hp.opr.support.tracegui.LogConfig.<init>(LogConfig.java:79)
at com.hp.opr.support.tracegui.TraceGUI.loadTraceConfigurationFile(TraceGUI.java:2188)
at com.hp.opr.support.tracegui.TraceGUI.main(TraceGUI.java:1672)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang.text.StrLookup
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 3 more

I therefore just started the opr-tracegui console, and tried the same thing from there, via the menus. But even this (as now expected...) generated Java error messages. The collection action worked, but it created a zip-file that contained the logfiles that I have already looked into.

So thanks again for the tip, but unfortunately for me, it did not bring me any further 😞

Any other ideas, anyone?

Cheers,
Frank

0 Likes
Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..

Hi,

I eventually found the cause of our problems, which was even the cause of the large number of those Bios update messages in the log, I reckon. It turned out that we have a number of Node CIs in RTSM (25) with an incredibly high number of contained interfaces (from a couple of hundred to approx. 2000(!)). The excessive number of interfaces were caused by temporary interfaces being created locally on the nodes over time (this is an assumption) and us not having enabled aging for Interface CIs (this is a fact).

Yesterday I manually deleted those interface CIs in RTSM. Next week I will configure the aging mechanism even for CIT Interface, to make sure that we do not experience this problem again.

Thanks for all efforts!

Cheers,
Frank

View solution in original post

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.