NOTICE: Our Community is moving. Get more information.
In our x86 solaris environment there was an issue with respect to osagent. What we analyzed from the issue is that while INITIALIZING the the ORB using below API call:
our program gets stuck on the above mentioned CORBA API call for approximately 25 minutes. After 25 minutes only program execution continues.
Then we traced system calls of our program using "truss" utility which is using this CORBA API function in both non-working and working scenario.
In Non-working scenario we can see in the truss log that we are continuously making DSRequest's and getting no reply for that.
DSRequest message in truss look like below:
6.3351 sendto(14, 0x087A5A08, 160, 0, 0x08433578, 16) = 16017128/2: \0\0\0A0 :\0\0\0\0\0\0\f D S R e q u e s t\0\0\0\0\0\0\f\0\0\01017128/2: O R B e l i n e 2 . 0\0\0\0\0\0\0\001 V0180 v\0\0\004\0\0\0\017128/2: \0\0\0\0 :\0\0\0\0\0\0\b D S L o g i n\0\0\0\0\f\0\0\010 s e l i17128/2: b s s 1 a d m 1\0\0\0\0\0\0\0\0\0\0\004\0\0\0\0\0\0\0\0\0\0\00417128/2: \0\0\0\0\0\0\006\0\0\0\b n m s a d m\0\0\0\0 BE8\0\0\002\0\0A1E817128/2: AF_INET to = XXX.XXX.XXX.XXX port = XXXXX // hiding IP/port information intentionally.
Below is the output of pstack command on our process when our process got stuck on ORB_init function call:
fe399329 lwp_park (0, 0, 0) fe393b23 cond_wait_queue (816519c, 8165180, 0) + 5e fe393ff7 _cond_wait (816519c, 8165180) + 64 fe394039 cond_wait (816519c, 8165180) + 21 fe394072 pthread_cond_wait (816519c, 8165180) + 1b fe7c1496 __1cMVISConditionEwait6MrnIVISMutex__v_ (8165198, 816517c) + 2a fdf98f63 __1cGDSUserEopen6Fpkc2i_p0_ (0, 0, 36b0) + 3d3 fdf967c1 __1cGDSUserIinstance6Fi_p0_ (1) + 5d fdfb3c18 _vbroker_dsuser_loading (0) + 40 fe7c8393 __1cKVISLibraryKinitialize6Mpv_i_ (8164558, 0) + 37 fe7c86af __1cOVISModuleAdminKforeach_do6MmJVISModule_Mpv_i1_i_ (814ca98, fe7c87ac, 0, 0) + 83 fe7c875b __1cOVISModuleAdminSinitialize_modules6Mpv_i_ (814ca98, 0) + 2b fea8a267 __1cKVISManagerKORB_inited6MpnGVISORB__v_ (8141058, 80f5c60) + 10f fea06338 __1cFCORBAIORB_init6Fripkpcpkc_pnJCORBA_ORB__ (802cb2c, 80f5c30, 0) + 218 08064eb9 main (7, 802e430, 802e454) + 1519 080629da _start (8, 802ec78, 802ec98, 802eca8, 802ed1f, 802ed62) + 7a
But in working Scenario in the truss log we saw that after making DSRequest we are immediately getting DSReply, see below snapshot:3.4098 sendto(14, 0x085EFA08, 156, 0, 0x0830A578, 16) = 15617752/2: \0\0\09C :\0\0\0\0\0\0\f D S R e q u e s t\0\0\0\0\0\0\f\0\0\01017752/2: O R B e l i n e 2 . 0\0\0\0\0\0\0\001 V02 \ \0\0\004\0\0\0\017752/2: \0\0\0\0 :\0\0\0\0\0\0\b D S L o g i n\0\0\0\0\t\0\0\0\f o s s m17752/2: a s t e r\0\0\0\0\0\0\0\0\0\004\0\0\0\0\0\0\0\0\0\0\004\0\0\0\017752/2: \0\0\004\0\0\0\b r o o t\0\0\0\0\0\0 E X\0\0\002\0\0B4A217752/2: AF_INET to = 192.168.0.16 port = 3340717752/2: 3.4101 pollsys(0x08550968, 1, 0xFD0EEAD8, 0x00000000) = 117752/2: fd=14 ev=POLLRDNORM rev=POLLRDNORM17752/2: timeout: 2.999000000 sec
17752/2: 3.4102 recvfrom(14, 0x08202048, 8192, 0, 0xFD0EEA30, 0xFD0EEA0C) = 8417752/2: \0\0\0 T :\0\0\0\0\0\0\b D S R e p l y\0\0\0\0\f\0\0\010 O R B e17752/2: l i n e 2 . 0\0 2 . 0\0\0\002 V02B7\0\0\0\004\0\0\0\0\0\0\00217752/2: V02 \ \0\0\0\0 :\0\004\0\0\004 : :\0 e17752/2: AF_INET from = XXX.XXX.XXX.XXX port = XXXXX // hiding IP/port information intentionally.
From the above logs we made a judgement that in non-working scenario most probably the OSAGENT (CORBA Smart Agent) is not reachable over the network that is why we are not getting any reply.
Also I would like to say that even after 25 minutes we didn't received a "D S R e p l y" message in non-working scenario but process execution anyhow continued after 25 minutes.
First thing I want to know here is that, is our observation correct or not with respect to above scenario. As I am not clear what exactly this DSRequesst /DSReply mean. Is it really due to osagent unavailability over the network.
Now, to resolve the issue we are passing the below flag to CORBA::ORB_init API call:
We are using the above flag because we heard somewhere that osagent (Visibroker Smart Agent) is decommissioned and there is no more support available for this in future. And as the osagent is decommissioned from Visibroker we are using above flag to avoid Communication with the Smart Agent.
So, my primary question here is that, whether OSAGENT (Visibroker Smart Agent) is really decommissioned and Microfocus is really no more supporting osagent in future?
Any help regarding the issue will be appreciated.
Have you tried running the 'osfind' command from the machine with the delay starting up? Osagent is an integral part of the VisiBroker CORBA implementation and is fully supported. It sounds like you are simply experiencing this delay due to router/firewall configuration issues and you can use osfind to quickly identify connectivity issues such as this to coordinate with your network IT team and confirm you have the port access required for proper VisiBroker operation.
In reply to scott.kay:
Thanks for your super fast response. We also tried the suggestion you made that is to execute 'osfind' command and below are the results from osfind:
osfind: There are no agents running in your domain.
But with lsof command we are able to see osagent listening on it in the same server.
Here is the lsof command and output:
> lsof -ni udp:32908
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
osagent 10876 xxxxxxx 5u IPv4 0xfffffea23554de00 0t0 UDP *:32908
Any insights what we should be checking more.
In reply to motogeeeksatyam:
Please check the environment variable $OSAGENT_PORT. It should be the same as the port where osagent is listening to. The osfind will look for osagent on your network that is listening on that port.
However, if your application do not use osagent, then you can set the property vbroker.agent.enableLocator=false to disable any communication to osagent. That will also help to reduce the lag/slowness in case your application cannot find the osagent.
Should you need further assistance, we advice that you open an incident with us. Login to supportline.microfocus.com and use your Support Serial Number (not the VisiBroker's license key) to create an incident.