POA Segfault error 4 in libtoolkit.so.1


We've got a customer that is getting this error regularly in /var/log/messages and their Post Office crashes everytime. 2 weeks ago we migrated the Post Office from OES2sp3 / gw8.0.3 to oes11sp2 (64bit) / gw2012, everything ran smoothly for at least 3-4 days, then our current problems started. It is the error message that gets written to messages after the POA crashes. We have gw2012sp1 on oes11sp1 (64bit) (4GB RAM, 2 CPUs around 250 users) and plenty of space. Top shows that the system is not lacking in resources, etc... We have two Post Offices running on the server one is for users and the other is a library. The POA crashes with 50, 150 and 200 users connected. The users connect via C/S and are using gwclient8. After doing some tests it appears as if the POA crashes when a user tries to send an e-mail an all users.... The POA crashed yesterday and it appears as if the GWIA also "died" after the POA "died". When the POA "dies" (crashes), sometimes we can restart it with the rcgrpwise start po.dom command and sometimes that does not work. When it does not work we have to unload the running instance of gwpoa (the library post office, there are 2 instances of gwpoa running) then start the User PO and then restart the Lib-PO....then it is running again. They were several old mails in the wpcsout/problem directory which Ive "cleaned out", but the system has crashed since then.

Normally a Segmentation Fault means not enough resources are available for the POA (seems unlikely, maybe update to 8GB Ram. threads look ok.), or there is a bug, maybe a corrupt email or user database? But our gwchecks show nothing dramatic.

There have been Netwokr problems, a switch died last week but unfortunately I have no more info on this. We've added a secondary ip address to the server and have configured the poa to use it.

Any ideas?

Well, any help would be greatly appreciated and thank you in advance for your assistance.



  • Hi Mike,

    I'm sorry to hear that you are having these issues :( First question I'm going to ask is what support pack are you running for your 2012 system?

    Look forward to hearing back from you.

  • Hi Laura,

    It's not my system... I just refer to GW-Systems at customer as "ours". A little to involved maybe..?

    We've got sp1 (12.0.1-103731). There was a reason that I've forgotten.

    Should I be reading the ReadME for ps1 and/or sp2 again? Is this known?

    Thanks for you assistance.
  • Hi,

    Thank you for the feedback. In Novell's internal bug tracking system I did find this bug: Bug 758063 - GWPOA crashes intermitanty

    And the synopsis sounds very much like your setup. Now, according to what I can find, this was resolved in SP2.

    So my first suggestion is to upgrade your customer's GroupWise system to SP2

    You may also want to take a look at this TID: https://www.novell.com/support/kb/doc.php?id=3447847
    The reason for this is that if your crashes keep happening I think you may have to submit a core to Novell for analysis via an SR.

    Please post back and let us know how it goes.

  • Hi,

    Thanks for your help. I'll be presenting the customer with this info and see if we can't get them updated to sp2.

    Although, it seems as if I've solved the problem without sp2. Here's what I did.

    The POA had 25 C/S threads and 10 Message Handler Threads for 150 users and I changed these settings to 12 C/S Threads and 6 Message Threads. My understanding is that for every 25 users (connections?) that you need one C/S thread and that Messages Handler threads should be no more than half the C/S threads. They have consistently 150 users and my entries seem ok. I also checked and/or made the neccessary optimizations to /etc/opt/novell/nss/nssstart.cfg, to the /etc/opt/novell/ncpcserv.conf and in Remote Manager. The customer also had Postfix running with the gwia. Of course I deactivated that. Finally the VM had 4GB RAM and 4 CPUs... this I of course also changed to 8GB RAM and 2 CPUs.... There were also 3 curropt messages in their GWIA and a couple of hundred in the POA directories. They were also "removed".

    Since changing all the above mentioned settings plus a couple of other settings and a restart, it seems to be working fine. According to the customer that since the migration / upgrade when a user sent a mail on all gw-users the poa would crash. We've tested this and now it seems fine.

    Nevertheless, I will be trying to convince the customer to update to sp2 ASAP.

    Once again, thank you for your assitance and it is still possible that this "case" (thread) will be re-opened. Only time will tell.
  • Hi,

    Thanks for keeping us updated. Much appreciated :)

  • The solution that I've implemented has not worked..... Without the changes the POA would "die" every day, if not several times a day. Also, it would die everytime someone would send an e-mail on all users. With the changes the post office agent would only "die" about once a week..... Although it still dies.....

    So, we'll be updating their entire system to 12.0.2 this afternoon. In a couple of weeks we should, hopefully have some results.

    I'll keep this forum informed.

    Once again, thanks!

  • Hi Mike,

    Thanks for posting back. Please do let us know if SP2 solves the problem.

  • Hi,

    SP2 did not solve the problem. Not good....

    Here's the Error Message in /var/log/messages

    Sep 24 08:11:41 kzvk-lx-pkggw01 kernel: [52545.368430] gwpoa[10987]: segfault at ebf012bd ip 00000000f69f6368 sp 00000000efb7e440 error 4 in libtoolkit.so.1[f6987000 a4000]

    It' running on an OES11sp1 and has a replica of root. Everything seems fine with the host server. GW is on an NSS Volume. We've opened a call with Novell the SR 10857038181

    Any help would be greatly appreciated. Now I've got to sit down with the IT here and the Big bosses and try to explain to them what has happened....

    Wish me luck.
  • Hi,

    Good luck! Work with Novell to get a core of your sever and they should be able to figure out what is going on.

    Please keep me posted with the progress on this.