JGROUP error found in sm logs

Hi All,

we are using SMA - SM
Service manager 9.52 - Windows server
service Portal 2018.08

Request your kind help in resolving the below issue, as I understanding jgroup is used for communication between sm servers which are on Horizontal scaling. 

This is costing my email integration to stop, we are using connect IT 9.60 for inbound integration. everything works fine until we find the below errors, whenever our integration stops working then below error msg is seen.

I have removed unwanted adaptors on SM application servers (windows) and could not find any abnormal entries in etc/host file .still I am seeing these error msg. has anyone faced same similar error and found the fix.

4280( 7684) 07/21/2019 12:53:48 JRTE W Session 1531C77E34F68D1E1C8CCC56FB1C0ADB is no longer valid. Sending SOAP fault
4280( 7684) 07/21/2019 12:53:48 JRTE W Send error response: Session no longer valid
4280( 7780) 07/21/2019 12:54:03 BT-VWP-HPSMA-02-4280: no physical address for a9269ee6-2406-fcc2-08ad-a966d38a929d, dropping message
4280( 7780) 07/21/2019 12:54:03 [JGRP00011] BT-VWP-HPSMA-02-4280: dropped message 185,480 from non-member cca4ccae-fbac-3c40-bcc6-c1ed6ef222b0 (view=[BT-VWP-HPSMA-02-4280|3] [BT-VWP-HPSMA-02-4280, BT-VWP-HPSMA-02-7852, BT-VWP-HPSMA-02-3572, BT-VWP-HPSMA-02-212, BT-VWP-HPSMA-02-6664])
4280( 7780) 07/21/2019 12:54:32 BT-VWP-HPSMA-02-4280: no physical address for a9269ee6-2406-fcc2-08ad-a966d38a929d, dropping message
4280( 7780) 07/21/2019 12:54:54 BT-VWP-HPSMA-02-4280: no physical address for a9269ee6-2406-fcc2-08ad-a966d38a929d, dropping message
4280( 688) 07/21/2019 12:55:09 [JGRP00011] BT-VWP-HPSMA-02-4280: dropped message 185,514 from non-member cca4ccae-fbac-3c40-bcc6-c1ed6ef222b0 (view=[BT-VWP-HPSMA-02-4280|3] [BT-VWP-HPSMA-02-4280, BT-VWP-HPSMA-02-7852, BT-VWP-HPSMA-02-3572, BT-VWP-HPSMA-02-212, BT-VWP-HPSMA-02-6664]) (received 35 identical messages from cca4ccae-fbac-3c40-bcc6-c1ed6ef222b0 in the last 66,155 ms)

Regards,

Madhan

  • I also had simillar issues, fixed adjusting the parameters on sm.ini and sm.cfg..
    Make sure you set correctly all settings/parameters, in special the gossiprouter, groupname, system, gossiprouterhosts
    Take a look at the manual https://docs.microfocus.com/SM/9.60/Codeless/Content/serversetup/concepts/Configure_Jgroups_on_TCP_in_a_horizontal_envt.htm
    and review your sm.ini and sm.cfg. In case you still have problems share with us your sm.ini, sm.cfg and the information about your network interfaces and I can help you.
  • Thanks Breno,

    Currently jgroup is using UDP protocol, I will change it to TCP and update you the result.

    Regards,

    Madhan 

  • Hi Breno,

    I tried to configure but got no luck.  Got below error msg while I was trying on the primary server. 

    I am also attaching my sm.ini & sm.cfg file

    6908( 2312) 07/24/2019 20:46:39 JRTE I Starting TRCLIENT thread
    6908( 2312) 07/24/2019 20:46:39 JRTE I Waiting for TRCLIENT() to initialize.
    6908( 7856) 07/24/2019 20:46:39 RTE I Using "utalloc" memory manager, mode [0]
    6908( 7856) 07/24/2019 20:46:39 RTE I Process sm 9.52.2021 (P2) System: 50443 (0x784DFB00) on PC (x64 64-bit) running Windows (6.2 Build 9200) Timezone GMT 03:00 Locale en_US from ServerHost(removed actual servername)
    6908( 7856) 07/24/2019 20:46:39 RTE I Host network address: 10.10.10.171
    6908( 7856) 07/24/2019 20:46:39 RTE I Thread attaching to resources with key 0x784DFB00
    6908( 7856) 07/24/2019 20:46:39 JRTE I ServerSession is created with threadid 7856
    6908( 2312) 07/24/2019 20:46:39 JRTE I JGroups 3.6.2.Final

    6908( 2312) 07/24/2019 20:46:39 JRTE I JGroups 3.6.2.Final
    6908( 2312) 07/24/2019 20:46:40 failed connecting to ServerHost(removed actual servername)/10.10.10.171:7801: java.lang.Exception: Could not connect to ServerHost(removed actual servername)/10.10.10.171:7801
    6908( 2312) 07/24/2019 20:46:41 failed reconnecting stub to GR at ServerHost(removed actual servername)/10.10.10.171:7801: java.lang.Exception: Could not connect to ServerHost(removed actual servername)/10.10.10.171:7801
    6908( 2312) 07/24/2019 20:46:42 failed reconnecting stub to GR at ServerHost(removed actual servername)/10.10.10.171:7801: java.lang.Exception: Could not connect to ServerHost(removed actual servername)/10.10.10.171:7801
    6908( 2312) 07/24/2019 20:46:42 failed fetching members from ServerHost(removed actual servername)/10.10.10.171:7801: java.lang.Exception: Connection to ServerHost(removed actual servername)/10.10.10.171:7801 broken. Could not send GOSSIP_GET request, cause: java.lang.Exception: not connected
    6908( 2312) 07/24/2019 20:46:53 failed reconnecting stub to GR at ServerHost(removed actual servername)/10.10.10.171:7801: java.lang.Exception: Could not connect to ServerHost(removed actual servername)/10.10.10.171:7801

  • I think the error is because you are using the port 7800 to the GossipRouter and this port is the standard for each member... make a test setting a different port to the GossipRouter, remove the grouptcmbindport .. also to exclude name resolution problems, use the IP instead the name..

    sm.cfg

    sm -GossipRouter -Gossiprouterport:12001 -log:../logs/gossiprouter.log

     

    sm.ini

    groupname:hpservice
    jgroupstcp:1
    GossipRouterhosts:10.10.10.171[12001]
    groupbindaddress:10.10.10.171
    #grouptcpbindport:7802

     

    Let me know

  • HI Bruno,

     

    I am also facing the same issue and made the suggested modifications but still getting the below errors.

     

     

    IncentivesQA.zip
  • Could you share the log error?
  • Thanks Bruno,I have resolved this error by clearing it out the scdb system entry from the database and restarted the services on both nodes.It started working fine.This particular issue may not be relevant to this topic but every Friday,In our lower environments,Sm.exe process is consuming 100% and it is forcing us to restart the server to fix this issue.We are on 9.60 and recently installed oracle 12 c client on our dev and test servers.Except that we didn’t made any changes to our Dev&QA servers.Whats causing Sm.exe process to consume 100%,Any ideas would be very much helpful to resolve the issue.
  • That is strange; any tips from log? Are you able to connect to sm in such time? If yes go to system status, system monitor and check which process is consuming your cpu; if necessary enable a trace on it to figure out what is going on.