Intermitted LG failure connecting to Saas Performance Center

Hello
I'm hoping you can help me resolve the following errors. I'm getting these intermittently, and they cause any test running to fall over, and I need to reboot the load generator host from Performance Center, before I can resume.

I have executed the same test at different times, so I believe my scenario is OK. I just cannot seem to find any relevant reason or solution as to why these errors occur, and I therfore don't know how to resolve.

We have other load generators, that are connection OK to the Saas environment, and they run the same versions, and we do not encounter these issues.

Any guidance you could provide would be appreciated.

Errors in LoadRunner_agent_service.log
30/01/2017 16:44:01 Error: Communication error: SSL write error : [param not passed in call]. (sys error message - WSAECONNABORTED) [MsgId: MERR-10343]
30/01/2017 16:44:01 Error: Client almrwcpcmt125-mil5.saas.hpe.com is not responding. [MsgId: MERR-29992]
30/01/2017 16:44:01 Error: Communication error: The Client failed to send packet. The socket has been shut down. [MsgId: MERR-10343]
30/01/2017 16:44:01 Error: Communication error: The Client failed to send packet. The socket has been shut down. [MsgId: MERR-10343]
30/01/2017 16:44:01 Error: Two Way Communication Error: Function two_way_comm_post_message / two_way_comm_post_message_ex failed. [MsgId: MERR-60990]
30/01/2017 16:44:01 Error: Communication error: The Client failed to send packet. The socket has been shut down. [MsgId: MERR-10343]


Errors in RemoteManagement_agent_service.log
30/01/2017 09:46:31 Error: Communication error: The client SSL certificate is not trusted by the server. [MsgId: MERR-10343]
30/01/2017 09:46:31 Error: Two Way Communication Error: Function two_way_comm_resolve_userdata failed. Reason: invalid handle. [MsgId: MERR-60985]
30/01/2017 09:46:31 Error: Communication error: SSL write error : [param not passed in call]. (sys error message - WSAECONNRESET) [MsgId: MERR-10343]
30/01/2017 09:46:31 Error: Communication error: SSL write error : [param not passed in call]. (sys error message - WSAECONNRESET) [MsgId: MERR-10343]
30/01/2017 09:46:31 Error: Communication error: SSL write error : [param not passed in call]. (sys error message - WSAECONNRESET) [MsgId: MERR-10343]
30/01/2017 10:15:44 Error: Two Way Communication Error: Function two_way_comm_resolve_userdata failed. Reason: invalid handle. [MsgId: MERR-60985]
30/01/2017 14:05:40 Error: Communication error: Failed to connect to remote host [server full name: almrwcpcmt125-mil5.saas.hpe.com]. [MsgId: MERR-10343]
30/01/2017 14:05:40 Error: Two Way Communication Error: Function two_way_comm_resolve_userdata failed. Reason: invalid handle. [MsgId: MERR-60985]
30/01/2017 14:05:40 Error: Communication error: Failed to send message - socket is not connected yet. [MsgId: MERR-10343]

  • Hi Claire,

    Thanks for the update and kodus for the impressive drill down.

    It does seem strange and unrelated. Let me consult with subject matter experts internally. I will get back to you soon.

    Regards,

    Shlomi

     

  • Hi Claire,

    I have asked experts in the team and an assumption was made why it might be related. I'd like to check with you whether the assumption is corect.

    Does the Load Generator connects to the MI Listener using HTTP tunneling through the Apache proxy? If so, then the change in the configuration done in Apache/WebLogic might be related.

    In any case please make sure the timeout you defined is higher than the one defined in the LG agent settings  

    Regards,

    Shlomi

  • Hi Shlomi

    Thanks for your response.  On my LG, I have my agent configured as 'Enable Firewall Agent'.

    MI Listener Name is set
    Local Machine Name is set
    Connection Timeout (seconds) is 30

    Connection Type is TCP

    User Secure Connection (SSL) is checked.

    I have a ticket now open with the support team.  Unfortunately, adjusting the timeout doesn't appear to have a consistent impact, and I'm unable to determine any further information from the LR agent logs.  

     

    I should mention that reported back in Saas when this situation arises, I get the following type error:

    -27778 [GENERAL_MSG_CAT_SSL_ERROR] read to host <my host name> failed [10053].  Software caused connection abort.

  • Hi Claire,

    In the past when I practiced it myself I used to reduce the timeout to 5 seconds in order to ensure the agent always connects to the MI Listener. Not sure if will be good in this case, worth a try.

    Now that you have a ticket opened, please share with the support team a link to this post. I believe it will be easier to handle this case by actually looking at your system.

    Regards,

    Shlomi

  • Verified Answer

    Just an update, that we have resolved this issue locally.

    Basically, having run a wireshark trace, filtering between out LG and the MI listener, it was observed that we had a lot of packet retransmissions and such like accumulating in the run up to failure.

    Running a test on another LG did not generate any issues.

    It was then observed that the original LG only had a link speed of 100 Mbps, whereas the second LG had a link speed of 1 Gbps.  After speaking with our network team, it was established that we may have had an issue with the switch relating to our lab in recent weeks, that I wasn't previously aware of.  We hooked the LG into a 1 Gbps link, and the test completes again successfully without any issue.

    The 'network performance' view on the LG didn't really highlight any capacity issues.  It wasn't until we studied the wireshark trace and investigated further, that we noticed the link speed issue.

  • Thanks a lot for sharing Claire. Very useful for information.

    Glad to hear the issue is behind us.

    Regards,

    Shlomi