Slow gwia restart

Hello Members,

Every time I modify the settings of a gwia, it has to reload itself. That's normal, I can accept it. But every time it takes more than 10 minutes. Once I've waited for more than an hour. This is not acceptable. 10 minutes of service outage is not acceptable at all. There is a gwia switch called "Kill threads on exit or restart". It is set to true. I assume this should kill every thread instantly instead of gracefully waiting for them to terminate. Unfortunately I see the following in the logs hundreds of times.
10:39:41 0DA3 Waiting for GWIMAP-Handler_12 to terminate...
10:39:43 0DA3 Shutdown of Threads

So my question is how can I speed up the restart of a gwia. Why is the killthreads not working? I know, I can do an rcgrpwise restart, but I have 25 GWIAs. Restarting them one by one manually is not an enterprise solution.

Regards,
Gellért

Tags:

  • In article <gehorvath.7tunrz@no-mx.forums.microfocus.com>, Gehorvath
    wrote:
    > Every time I modify the settings of a gwia, it has to reload itself.
    > That's normal, I can accept it. But every time it takes more than 10
    > minutes. Once I've waited for more than an hour. This is not acceptable.


    Hi Gellért,
    Agreed, not acceptable, and I've never seen it take more than 2 minutes,
    so something is off.

    So lets start with the basics:
    Which build of GWIAs are in use? All the same?
    What OS / version of OS are you running on? ('Pure' SLES vs OES. "cat
    /etc/*release")
    What is CPU, RAM, and Disk usage like normally and when you are
    restarting GWIA? http://www.konecnyad.ca/andyk/nixadmin.htm

    What else are these systems doing? Are they primarily for GWIA or are they
    running many other things that we might have resource contention?

    Does the rcgrpwise of your GWIAs go quickly or slow as well? If slow, how
    about a full rcgrpwise stop of the GWIA?

    Interesting, the "Kill threads on exit or restart" doesn't have an
    indication in the logs like it has so many other settings. You can
    confirm that it showing on GWIA's web console on the Configuration page,
    under SMTP/MIME Settings. Worth confirming that it is actually set.

    If you have GWIA's log settings to dignostic, does anything show in the
    log about the restart? Verbose mode doesn't show anything on a full
    reboot which is supposed to be equivalent of a rcgrpwise stop.


    Andy of
    http://KonecnyConsulting.ca in Toronto
    Knowledge Partner
    http://forums.novell.com/member.php/75037-konecnya
    If you find a post helpful and are logged in the Web interface, please
    show your appreciation by clicking on the star below. Thanks!

  • Hello Andy,

    There are 5 dedicated GWIAs with own domains on separate virtual machines. System is GW 2014 R2 SP1 HP2 (build 125534) on 4 SLES 12 SP1 64 bit ( latest updates) 1 SLES 11 SP4 (I will reinstall this server with SLES 12 soon). These are the most problematic. 2 for SMTP and 3 for IMAP. IADOM1 and GWIA1 is on box1, IADOM2 and GWIA2 is on box2 and so on. There is nothing else on these machines, just GW. The dedicated IMAP servers have the slow restart problem during work hours. There are 2 load balancers (ipvsadm), one for IMAP and one for SMTP.
    Everything is on vmware. 2-4 CPUs, 4 GB memory. Load in under 1 all the time.
    Full rcgrpwise stop is also slowish, but not as slow as a "Restart Internet Agent" on the web interface. rcgrpwise stop does its job in a minute or so. This is acceptable.
    --killthreads is there in all the gwia.cfg files and in GWAC. The web interface shows also "Kill threads on exit or restart: Yes"
    IMHO the problem is that GWIA cannot close the open IMAP threads in time. The question is why? If there are no open IMAP sessions (long time after work hours), then the restart is reasonably fast, under 1 minute. The IMAP servers are busy. About 700 unique users have access to their mailboxes over these IMAP servers according to the logs. 20 IMAP threads are configured on each GWIA and sometimes all the 20 threads are busy. (not all the 60 threads)
    I will try the diagnostic logging on Wednesday and see if there is something in it.

    TIA,
    Gellért
  • Hi Gellért,

    It would be helpful to know more about what is happening when you restart your GWIA(s). As andy suggested, diagnostic logging is a good place to start.

    When you restart your GWIA, there could be some bottlenecks which impact performance during a period of increased activity. The bottlenecks may not just be within GroupWise!


    • Do you know if the major part of the delay is due to GWIA shutting down or starting up?
    • Are the VMs running your GWIAs on the same host as your POs?
    • Is there inter-host IO or is IO confined to a single ESXi host? Do these two scenarios affect restart times?
    • Does the VMware perfoemance monitor show greatly increased IO, disk activity, or network activity during a restart?

    If restartting your GWIA takes longer during high IMAP activity, there may be additional information in other logs.
  • Hi.

    Am 13.02.2017 um 19:26 schrieb gehorvath:
    > --killthreads is there in all the gwia.cfg files and in GWAC. The web
    > interface shows also "Kill threads on exit or restart: Yes"
    > IMHO the problem is that GWIA cannot close the open IMAP threads in
    > time.


    FWIW: Killthreads does nothng for IMAP threads, and doesn't claim to do
    so. As per the docs:

    "Kill Threads on Exit or Restart: Configure the GWIA to stop
    immediately, without allowing its send/receive threads to perform their
    normal shutdown procedures."

    Aka, that's for SMTP threads only.

    CU,
    --
    Massimo Rosen
    Micro Focus Knowledge Partner
    No emails please!
    http://www.cfc-it.de
  • In article <iaqoA.475$tn5.462@novprvlin0913.provo.novell.com>, Massimo
    Rosen wrote:
    > FWIW: Killthreads does nothng for IMAP threads, and doesn't claim to do
    > so.


    Sounds potentially something for an Idea
    https://ideas.microfocus.com/MFI/novell-gw/

    Though I can't imagine what would be so urgent a change for an IMAP only
    GWIA that it couldn't be saved for an evening or early morning task to
    restart. What is the business need for this?



    Andy of
    http://KonecnyConsulting.ca in Toronto
    Knowledge Partner
    http://forums.novell.com/member.php/75037-konecnya
    If you find a post helpful and are logged in the Web interface, please
    show your appreciation by clicking on the star below. Thanks!

  • Today I turned on diagnostic logging. Unfortunately users went home already, so I will have results only on Monday.
    One another thing to mention:
    The domain is getting corrupt on this machine too often. Sometimes twice or more in a week and it is always related to a high IMAP usage period. Back in time, when SMTP and IMAP were on the same machine, sometimes SMTP messages were messed up, lost, truncated during high IMAP usage. I've opened an SR, and a GW developer went through the logs and partial MIME files, and he concluded, that probably the OS run out of file descriptors. The solution was that I've separated IMAP and SMTP service to different machines, and after that the SMTP problems went away, but the IMAP problems stayed. Of course the OS limits are in the sky, but I am open for any suggestions.
  • So you are saying that if there are open IMAP connections on a GWIA, than it cannot restart itself, becomes unresponsive, and this is working as designed. If this is true, than this sounds pretty serious for me to make at least a small side note about it in the official documentation.

    By the way, there is nothing in to logs with diagnostic logging level either, just
    11:41:17 C50F Shutdown of Threads
    11:41:17 C50F Waiting for GWIMAP-Handler_10 to terminate...
    11:41:17 C50F Shutdown of Threads
    11:41:17 C50F Waiting for GWIMAP-Handler_10 to terminate...
    ...
    until I restart GWIA.
  • gehorvath;2451399 wrote:
    So you are saying that if there are open IMAP connections on a GWIA, than it cannot restart itself, becomes unresponsive, and this is working as designed.


    Hi Gellért,

    Most developers assign a higher priory to program defects than they do to requests for program enhancements. When a feature does not work as (a customer) expected and additional code is required to make it do so, that is often considered as an enhancement and the response is likely to be "Working as designed". Less frequently, a product can have a design defect. Instead of resolving the defect, a developer may claim the product is "Working as designed". I don't think this is the case with this issue.

    In a previous post you said:
    The domain is getting corrupt on this machine too often. Sometimes twice or more in a week and it is always related to a high IMAP usage period. Back in time, when SMTP and IMAP were on the same machine, sometimes SMTP messages were messed up, lost, truncated during high IMAP usage. I've opened an SR, and a GW developer went through the logs and partial MIME files, and he concluded, that probably the OS run out of file descriptors. The solution was that I've separated IMAP and SMTP service to different machines, and after that the SMTP problems went away, but the IMAP problems stayed.


    You have opened an SR and have implemented the suggestions the support group have provided. It appears their suggestions have not resolved your issue. Your next step should be to update the SR with the information you provided in this thread and any other new information you have since implementing their suggestions. If you are not getting the level of support you feel is necessary to resolve this issue, you should have the issue escalated and, if necessary, speak with a manager.

    Clearly, your system is being impacted and it appears that the root cause has yet to be determined. It may not even be a GroupWise issue.

    Follow up with Tech Support and please let us know what you learn.
  • gehorvath;2451212 wrote:
    Back in time, when SMTP and IMAP were on the same machine, sometimes SMTP messages were messed up, lost, truncated during high IMAP usage. I've opened an SR, and a GW developer went through the logs and partial MIME files, and he concluded, that probably the OS run out of file descriptors. The solution was that I've separated IMAP and SMTP service to different machines, and after that the SMTP problems went away, but the IMAP problems stayed.


    Hi Gellért,

    When you separated IMAP and SMTP service to different machines, did you


    • Create a new VM for the SMTP service?
    • Create a new VM for the IMAP service?
    • Create a new VM for the each service?

    The domain is getting corrupt on this machine too often. Sometimes twice or more in a week and it is always related to a high IMAP usage period.

    If it is related to IMAP activity, I'm curious whether this is happening on the original VM where both services were running or, assuming it was the IMAP service that was moved to a new VM, the VM where IMAP is now running.

    Have you looked for general file system errors?

    Have you looked at /var/log/messages to see if there are system issues that could contributing to this behavior?

    Do you, by chance, have an antivirus program scanning your GroupWise directories?
  • gehorvath;2450883 wrote:
    There are 2 load balancers (ipvsadm), one for IMAP and one for SMTP.

    Hi Gellért,

    Could ipvsadm be contributing to the problem?

    I don't know how your system is configured or if ipvsadm is at all involved in the GWIA shutdown but a Google search revealed that some users encounter TCP timeouts which definitely would slow down traffic.

    Since you still don't know the cause of the restart delays, I'm just thinking outside the box (pun intended).