GW 2014 R2 - gwdva segfaults / POA hangs/crashes

I upgraded to GW 2014 R2 on 08-12-2015 and since then we are having a lot of issues concerning the POA crashing and GWDVA segfaults.

Dec 9 15:10:12 grpwise1 kernel: [6611998.920528] gwdva[21369] general protection ip:7f46e8334713 sp:7f46e20c10d0 error:0 in libsvrtk.so.1[7f46e8306000 39000]
Dec 10 09:16:09 grpwise1 kernel: [6676979.446209] gwdva[10260]: segfault at 9000007f80 ip 00007f604efd2713 sp 00007f6048de00d0 error 4 in libsvrtk.so.1[7f604efa4000 39000]
Dec 10 09:25:52 grpwise1 kernel: [6677560.446879] gwdva[10734] general protection ip:7fd485fcb713 sp:7fd47fc560d0 error:0 in libsvrtk.so.1[7fd485f9d000 39000]
Dec 10 09:26:03 grpwise1 kernel: [6677571.867202] gwdva[10775]: segfault at 24 ip 0000000000424d8e sp 00007f4506b62b00 error 4 in gwdva[400000 4d000]
Dec 10 10:49:55 grpwise1 kernel: [6682590.461549] gwdva[12634] general protection ip:7fe5e6fa8713 sp:7fe5e0db60d0 error:0 in libsvrtk.so.1[7fe5e6f7a000 39000]
Dec 10 10:52:13 grpwise1 kernel: [6682727.502642] gwdva[12690]: segfault at 100000020 ip 00007f731ca7e713 sp 00007f731680b0d0 error 4 in libsvrtk.so.1[7f731ca50000 39000]

Novell support adviced us to rebuild the POA DB and switch to gwdva (instead of DCA), but nothing seems to work.

Also I see a lot of messages like below in the POA log:
11:29:43 434B Error streaming an attachment [8911]
11:29:43 435B Error streaming an attachment [8911]
11:29:43 4353 Error streaming an attachment [8911]
10:51:57 EA92 Possibly damaged blob in database = user6ft.db
10:51:57 EA7A Possibly damaged blob in database = user6ft.
11:40:22 43FC Conversion Failed: Error [8912] (/grpwise/poa/oftemp/gwdca/in/5669646c.tmp) exceeded maximum conversion time limit

I am also seeing files here:
/grpwise/poa/oftemp/gwdca/problem

and a lot of htm files in the /grpwise/poa/oftemp/gwdca/out directory since we upgraded.

I ran gwcheck this night with everything enabled and it seems to clear up the messages until the POA crashes again and they seem to start appearing again. Also I disabled Quickfinder indexing but it doesn't seem to help either.

Is anyone else seeing anything like this? / Any suggestions for workarounds?

Kind regards, Don

Tags:

  • Hi Don,

    Not fun when that happens...

    For a more complete picture, could you tell us some more about the details of the system(s) GroupWise is running on?
    Things like version and SP level of the host, amount RAM, etc.

    Please also post the output of these commands:

    #rpm -qa|grep groupwise
    #uname -a
    #zypper ca

    Also, what filesystem are the GroupWise databases on? And have there been changes there, or was this a straight inplace upgrade?


    Cheers,
    Willem
  • AVG-Don;2413437 wrote:
    I upgraded to GW 2014 R2 on 08-12-2015 and since then we are having a lot of issues concerning the POA crashing and GWDVA segfaults.

    ...

    Novell support adviced us to rebuild the POA DB and switch to gwdva (instead of DCA), but nothing seems to work.


    I'm also curious if a core dump has been uploaded to and analyzed by Novell?
  • Hello Willem,

    Thank you for your response.

    It appears I had disabled gwdva/quickfinder in the POA web interface instead of webadmin, so it kept being re-enabled. Disabling it in Groupwise webadmin seems to be a workaround.

    As for the environment, we are running Groupwise on SLES 11 SP3 with 10GB ram, using ext3 filesystem.

    # rpm -qa|grep groupwise
    novell-groupwise-server-14.2.0-122092
    novell-groupwise-gwmon-14.2.0-122092
    novell-groupwise-monitor-webapp-14.2.0-122092
    novell-groupwise-gwha-14.2.0-122092

    uname: Linux grpwise1 3.0.101-0.47.71-default #1 SMP Thu Nov 12 12:22:22 UTC 2015 (b5b212e) x86_64 x86_64 x86_64 GNU/Linux

    # zypper ca
    # | Alias | Name | Enabled | Refresh
    --- -------------------------------------------------- -------------------------------------------------- --------- --------
    1 | SUSE-Linux-Enterprise-Server-11-SP2 11.2.2-1.234 | SUSE-Linux-Enterprise-Server-11-SP2 11.2.2-1.234 | No | No
    2 | nu_novell_com:SLE11-Public-Cloud-Module | SLE11-Public-Cloud-Module | No | Yes
    3 | nu_novell_com:SLE11-SP1-Debuginfo-Pool | SLE11-SP1-Debuginfo-Pool | No | Yes
    4 | nu_novell_com:SLE11-SP1-Debuginfo-Updates | SLE11-SP1-Debuginfo-Updates | No | Yes
    5 | nu_novell_com:SLE11-SP2-Debuginfo-Core | SLE11-SP2-Debuginfo-Core | No | Yes
    6 | nu_novell_com:SLE11-SP2-Debuginfo-Updates | SLE11-SP2-Debuginfo-Updates | No | Yes
    7 | nu_novell_com:SLE11-SP3-Debuginfo-Pool | SLE11-SP3-Debuginfo-Pool | No | Yes
    8 | nu_novell_com:SLE11-SP3-Debuginfo-Updates | SLE11-SP3-Debuginfo-Updates | No | Yes
    9 | nu_novell_com:SLE11-Security-Module | SLE11-Security-Module | No | Yes
    10 | nu_novell_com:SLE11-WebYaST-SP2-Pool | SLE11-WebYaST-SP2-Pool | No | Yes
    11 | nu_novell_com:SLE11-WebYaST-SP2-Updates | SLE11-WebYaST-SP2-Updates | No | Yes
    12 | nu_novell_com:SLES11-Extras | SLES11-Extras | No | Yes
    13 | nu_novell_com:SLES11-SP1-Pool | SLES11-SP1-Pool | No | Yes
    14 | nu_novell_com:SLES11-SP1-Updates | SLES11-SP1-Updates | No | Yes
    15 | nu_novell_com:SLES11-SP2-Core | SLES11-SP2-Core | No | Yes
    16 | nu_novell_com:SLES11-SP2-Extension-Store | SLES11-SP2-Extension-Store | No | Yes
    17 | nu_novell_com:SLES11-SP2-Updates | SLES11-SP2-Updates | No | Yes
    18 | nu_novell_com:SLES11-SP3-Extension-Store | SLES11-SP3-Extension-Store | Yes | Yes
    19 | nu_novell_com:SLES11-SP3-Pool | SLES11-SP3-Pool | Yes | Yes
    20 | nu_novell_com:SLES11-SP3-Updates | SLES11-SP3-Updates | Yes | Yes

    Groupwise database runs on ext3 filesystem.

    Thank you for the suggestion regarding core dump analysis, I have send the core dumps to Novell support.

    Kind regards, Don
  • AVG-Don;2413540 wrote:
    It appears I had disabled gwdva/quickfinder in the POA web interface instead of webadmin, so it kept being re-enabled. Disabling it in Groupwise webadmin seems to be a workaround.


    Ok, good that the system now at least is stable again.

    AVG-Don;2413540 wrote:
    As for the environment, ..


    I don't see anything strange in that output... the system itself looks up to date with proper channel registration etc...

    AVG-Don;2413540 wrote:
    Thank you for the suggestion regarding core dump analysis, I have send the core dumps to Novell support.


    Before moving further with all sorts of database checks etc, I think it's probably best to await Novell's answer on what they see in the core dump you've sent them.


    With the quickfinder disabled, are there any notable errors (related to database corruption/locks/etc) in the POA logs?

    Cheers,
    Willem
  • Thanks for the response and suggestions.

    In the end I got the issues resolved by adding the "--dcafilter pdf" switch to the POA startup config, moving the DVA agent to another server, running GWcheck with all options enabled, rebuilding some mailboxes (just to be sure) and regenerating quickfinder indexes for a lot of mailboxes.

    Everything seems stable again now, perhaps this information will benefit others and save them the time to figure this out.

    Kind regards, Don
  • Hi Don,

    AVG-Don;2413782 wrote:
    In the end I got the issues resolved by adding the "--dcafilter pdf" switch to the POA startup config, moving the DVA agent to another server, running GWcheck with all options enabled, rebuilding some mailboxes (just to be sure) and regenerating quickfinder indexes for a lot of mailboxes.

    Everything seems stable again now,


    Good to hear you've been able to work through it. That makes me curious as to what's in those PDFs that seemingly leads to those previous crashes.


    AVG-Don;2413782 wrote:
    perhaps this information will benefit others and save them the time to figure this out.


    I'm sure it will. Thanks for feeding it back into the forums!

    Cheers,
    Willem
  • I am having the exact same issues as AVG-Don. I have moved the DVA to another server and running a GWCheck right now. I ftp my core file to novell today so an engineer can look at it. I will let you know if the steps AVG-Don posted fix my issue also.

    - Duggan
  • same issue too, since upgrade from gw2014.0.2 to gw2014R2 in december (gwdva caused high utilization of poa and crashed finaly).
    We never had problems with a crashed gw-system before (oes11sp2) ...

    the fix for the moment is the same like described here - disable quickfinder on the po-server and stop gwdva on webaccess-server.

    we opened a SR and hope for a gwdva bug fix.
  • Same here...Just upgraded from GW 2012 to 2014 R2, and we have had the PO hang about 5 times in 2 days.

    When users report that email is offline rcgrpwise status shows the POA as running, but the gwdva is just "dead". Restarting the processes gets things back on track temporarily

    Running OES 11 (x86_64) SP2, on SLES 11 SP3

    FINDINGS:

    /var/log/messages:
    Jan 12 14:51:27 srvgw1 kernel: [312849.695982] gwdva[2567]: segfault at 100000021 ip 00007f898c11d713 sp 00007f8985eae0d0 error 4 in libsvrtk.so.1[7f898c0ef000 39000]
    Jan 12 14:56:50 srvgw1 kernel: [313172.189327] gwdva[11003] general protection ip:7fbb8fa2a713 sp:7fbb897bb0d0 error:0 in libsvrtk.so.1[7fbb8f9fc000 39000]
    Jan 13 08:54:32 srvgw1 kernel: [377696.833587] gwdva[19735] general protection ip:7f1021208713 sp:7f101b01a0d0 error:0 in libsvrtk.so.1[7f10211da000 39000]
    Jan 13 10:44:21 srvgw1 kernel: [384272.445126] gwdva[30296] general protection ip:7fc68ec85713 sp:7fc6889950d0 error:0 in libsvrtk.so.1[7fc68ec57000 39000]
    Jan 13 12:22:14 srvgw1 kernel: [390133.371591] gwdva[9782] general protection ip:7f4ef9d56713 sp:7f4ef3ae70d0 error:0 in libsvrtk.so.1[7f4ef9d28000 39000]
    Jan 13 12:24:33 srvgw1 kernel: [390271.871705] gwdva[20569] general protection ip:7f8b71048713 sp:7f8b6ad580d0 error:0 in libsvrtk.so.1[7f8b7101a000 39000]
    Jan 13 12:24:35 srvgw1 kernel: [390273.677795] gwdva[18456]: segfault at 18 ip 00007f06a06ee2c2 sp 00007fff971f2b40 error 4 in libwv_core.so[7f06a0596000 3e2000]
    Jan 13 13:26:23 srvgw1 kernel: [393973.628652] gwdva[20688] general protection ip:7f4e2e5d9713 sp:7f4e2836a0d0 error:0 in libsvrtk.so.1[7f4e2e5ab000 39000]
    Jan 13 15:26:22 srvgw1 kernel: [401157.361517] gwdva[27548] general protection ip:7f395500d713 sp:7f394ed1d0d0 error:0 in libsvrtk.so.1[7f3954fdf000 39000]
    Jan 13 15:26:31 srvgw1 kernel: [401166.217473] gwdva[27563]: segfault at 24 ip 0000000000424d8e sp 00007f7fcde11b00 error 4 in gwdva[400000 4d000]
    Jan 13 15:26:31 srvgw1 kernel: [401166.591977] gwdva[27579]: segfault at 24 ip 0000000000424d8e sp 00007f6652520b00 error 4 in gwdva[400000 4d000]
    Jan 13 15:37:43 srvgw1 kernel: [401836.910923] gwdva[973] general protection ip:7f1c1de81713 sp:7f1c17c120d0 error:0 in libsvrtk.so.1[7f1c1de53000 39000]



    POA logs:
    14:13:08 A06A Conversion Failed: Error [8210] (/media/nss/GW/gwsystem/po1/oftemp/gwdca/in/56965b75.tmp)
    14:13:08 A06A Conversion Failed: Error [8210] (/media/nss/GW/gwsystem/po1/oftemp/gwdca/in/56965b76.tmp)
    14:13:09 A06A Conversion Failed: Error [8210] (/media/nss/GW/gwsystem/po1/oftemp/gwdca/in/56965b78.tmp)
    14:13:09 A06A Conversion Failed: Error [8210] (/media/nss/GW/gwsystem/po1/oftemp/gwdca/in/56965b79.tmp)
    14:13:09 A06A Conversion Failed: Error [8210] (/media/nss/GW/gwsystem/po1/oftemp/gwdca/in/56965b7b.tmp)
    14:13:09 A06A Conversion Failed: Error [8210] (/media/nss/GW/gwsystem/po1/oftemp/gwdca/in/56965b7c.tmp)


    GROWING NUMBER OF FILES HERE:
    #@ about 14:00hrs
    ll /media/nss/GW/gwsystem/po1/oftemp/gwdca/out | wc -l
    136146


    # 20:23hrs
    ll /media/nss/GW/gwsystem/po1/oftemp/gwdca/out | wc -l
    139446


    This "out" folder is 15GB, and the number hasn't grown in the last 4hrs or so, which could correspond with turning off the Quickfinder. ??

    Does anyone know if these files can just be deleted?

    CORE FILE
    I found this core file: /core
    and it was most certainly created by gwdva:

    strings /core | head
    CORE
    CORE
    gwdva
    /opt/novell/groupwise/agents/bin/gwdva /ip=192.168.0.5 /httpport=8301 /maxtime
    CORE
    CORE
    in processing request - convert convert</h1></b
    LINUX
    in processing request - convert convert</h1></b
    CORE


    I'll have to raise an SR also.
  • I have the same issue since upgrading. I have an open SR with novell. I have sent them several GW cores and it is still will engineering. Has anyone else gotten a fix yet? It's been a week for me fighting this. As of now I'm kinda stable because I did the dcafilter and also removed the dva from my main PO server and have it resourcing another one. I really expected a fix from Novell by now. Please let me know if anyone hears anything. Thanks.