GWPOA trap - post office dies

I'm seeing in /var/log/messages a line:

kernel: [blah] gwpoa[psid] trap stack segment ip:pointer sp:pointer error:0 in libc-2.11.3

And the post office is dead. I did get a core dump. Is the next logical step to open a ticket with Micro Focus support?

This post office dies maybe once every few weeks. Prior to this, the error in /var/log/messages was:

kernel: [blah] gwpoa: page allocation failure: order:1, mode:0x20

But last weekend we upgraded to build 129832 of GroupWise.

Tags:

  • In article <DGerisch.8gphwo@no-mx.forums.microfocus.com>, DGerisch
    wrote:
    > This post office dies maybe once every few weeks. Prior to this, the
    > error in /var/log/messages was:
    >
    > kernel: [-blah-] gwpoa: page allocation failure: order:1, mode:0x20


    how much free memory is the server running with? GroupWise will try to
    use as much as it can for cache, so we check by monitoring how much
    swap space is being used. If swap is regularly more than half used, you
    have a memory issue.

    Which version of SLES are you running on? Is it with OES? Do you have
    ganglia handy to see the memory use graphs?

    Is this a vm or on physical? Trying to think of other memory checks if
    it there is plenty of it

    Is the time of these the same time of day or just any time?


    Andy of
    http://KonecnyConsulting.ca in Toronto
    Knowledge Partner
    http://forums.novell.com/member.php/75037-konecnya
    If you find a post helpful and are logged in the Web interface, please
    show your appreciation by clicking on the star below. Thanks!

  • The server has 8 GB RAM. It is using a little under 2 GB swap. Uptime is 33 days.

    Yes, it is running OES; version 11.2 (SLES 11.3)

    This is a virtual machine guest.

    I think we turned ganglia off because it consumes a lot of disk space on the root partition, and our VM templates have a root partition on the small side (20 GB).

    The crashes do not seem to follow a pattern of time of day. Three days ago it died in the middle of the night (gw checks were running). Yesterday, it died around 9:15 in the morning (nothing but normal user work going on at the time).

    Thank you any ideas you have.
  • In article <DGerisch.8grumn@no-mx.forums.microfocus.com>, DGerisch wrote:
    > I think we turned ganglia off because it consumes a lot of disk space on
    > the root partition,


    Something is off with that unless you have a whole lot of servers with it
    on. In one instance I see 255MB in an instance that sees 5 servers for a
    few years. If you are seeing the /var/opt/novell/ganglia folder being much
    more than 50MB per server in the group, then there is something wrong
    there. Ganglia is a powerful enough maintenance tool that it is worth
    fixing that.
    Depending on how it is turned off, the other servers may have the data for
    it, so worth checking them to see if they have your POA's server in it.

    > It is using a little under 2 GB swap.

    Is it actually using it all? The output of
    free -m
    should typically show more swap unused than used, otherwise you are likely
    memory starved and just growing the RAM allocation by a couple GB will
    make a big difference getting rid of those memory allocation errors and
    likely improving performance. For more on what free tells us, see
    http://www.konecnyad.ca/andyk/freemem.htm


    Andy of
    http://KonecnyConsulting.ca in Toronto
    Knowledge Partner
    http://forums.novell.com/member.php/75037-konecnya
    If you find a post helpful and are logged in the Web interface, please
    show your appreciation by clicking on the star below. Thanks!

  • I'm seeing the same thing in GW18.0.1. (running groupwise-server-18.0.1-129782) 3 times in 2 days, now.
    From dmesg:
    [1435940.872540] gwpoa[6628] general protection ip:7f415d94f453 sp:7f4156bc9620 error:0 in libc-2.11.3.so[7f415d8d6000 172000]
    [1455277.651469] gwpoa[19639] general protection ip:7f6f88cf83cf sp:7f6f80da2f10 error:0 in libc-2.11.3.so[7f6f88c7f000 172000]


    Server has 24GB of memory. 124M of 4GB swap being used. Here's free -m:
                 total       used       free     shared    buffers     cached
    Mem: 24161 17604 6557 2 50 8469
    -/ buffers/cache: 9084 15077
    Swap: 4102 124 3978



    I don't have a core, but I just set "GROUPWISE_MAX_CORE_FILE_SIZE="unlimited"" in /etc/init.d/grpwise so it should generate one next time it dumps.
  • same here: gwpoa dead

    physical server with 32GB RAM, OES11 SP3, groupwise-server-14.2.3-129832

    /var/log/messages: kernel: [2115919.691286] gwpoa[6475] trap stack segment ip:7fb53de293d6 sp:7fb5323c0180 error:0 in libc-2.11.3.so[7fb53ddb0000 172000]

    It never happend before - it's first time after update from groupwise 14.2.2 --> 14.2.3
  • I've opened an SR about this (SR101164367711) and submitted cores. It's been escalated to engineering, but haven't heard anything back yet from backline.
    It's still happening pretty much daily.
  • There doesn't happen to be old versions (like 2012) of Webaccess still
    user accessible?


    On 5/24/2018 8:44 AM, adrockk wrote:
    >
    > I've opened an SR about this (SR101164367711) and submitted cores. It's
    > been escalated to engineering, but haven't heard anything back yet from
    > backline.
    > It's still happening pretty much daily.
    >
    >


  • unsigned;2481691 wrote:
    There doesn't happen to be old versions (like 2012) of Webaccess still
    user accessible?


    No old Webaccess, no.
    I do have GW Mobility 2014r2 still running (since the db update scripts are broken out of the box for me, and I have yet to find time to rebuild it from scratch).

    I did receive a response from engineering regarding the ticket, and they created a bug for it, and the target fix isn't until 18.0.2; they didn't give me the bug #, but I just asked for it.
  • Adrockk,

    > I did receive a response from engineering regarding the ticket, and they
    > created a bug for it, and the target fix isn't until 18.0.2; they didn't
    > give me the bug #, but I just asked for it.


    check your email :-)

    Pam

  • Pam has been nice enough to reach out with an update:
    The Bug is #1093929, so if you want to open an SR for this, referencing that bug might expedite the process. (and might help with getting an FTF out sooner than SP2).