mschlawin Frequent Contributor.
Frequent Contributor.
2771 views

PO Crashes - glibc double free or corruption

Hello,
I'm running GW 8 sp2 on OES Linux SLES 10 / OES 2 This system has been VERY stable and has been running just fine for the last several years. I have not made any changes to the system recently, other than the patch channel patches. This GW server has a GWIA, an MTA, and two POs on it.

One of the POs has now crashed four times in the last three days (twice this morning) The error is:

fvlgw:/media/nss/GWVOL/gwlogs/teachpo # *** glibc detected *** ./gwpoa: double free or corruption (fasttop): 0x087a60e8 ***

When I restart the post office, I get a DA03 error mailed to me.

I have searched the forums and found a suggestion to add the line ulimit -n 2048 to the /etc/init.d/grpwise startup file. I did that on Saturday and have had three crashes since.

Anyone have any idea what I can do next. I'm starting to panic!

Matt
Labels (2)
0 Likes
8 Replies
Bob-O-Rama
Visitor.

Re: PO Crashes - glibc double free or corruption

Don't panic. First off, ulimit can hurt you too. Use ps -ef | grep gwpoa to find your POA's process ID.

gwuber:~ # ps -ef | grep gwpoa
root 21313 1 2 Jan22 ? 11:46:04 /opt/novell/groupwise/agents/bin/gwpoa @gws3_po.poa
root 21355 1 2 Jan22 ? 12:39:03 /opt/novell/groupwise/agents/bin/gwpoa @gws2_po.poa
root 21396 1 2 Jan22 ? 13:50:31 /opt/novell/groupwise/agents/bin/gwpoa @gws1_po.poa
root 28344 1 11 Jan25 ? 2-06:43:27 /opt/novell/groupwise/agents/bin/gwpoa @staff.poa


Then, from the example above, you can see the ACTUAL limits on the given process:

gwuber:~ # cat /proc/28344/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set 21409488896 unlimited bytes
Max processes 192094 192094 processes
Max open files 200000 200000 files
Max locked memory unlimited unlimited bytes
Max address space 21874114560 unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 192094 192094 signals
Max msgqueue size 8192000 8192000 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us


So prior to tuning, see what is actually happening. You might find you are actually restricting the limits. Now to the gasping linux sysadmins, obviously setting limits to "unlimited" is a bad practice, as these limits are designed to keep rogue processes or users from killing the system. But.... i never said i was a good linux admin - I'm just a guy who wants GroupWise to work.

- - -

In the absence of ulimit commands in your GroupWise startup file, /etc/security/limits.conf can be used to set the limits. Here is the tail end of ours:

* soft nofile 32769
* hard nofile 65536
* - core unlimited
* - msgqueue 8192000
* - memlock unlimited
# End of file

- - -

Rig the server to get cores when agents crash. See TID 3447847 and TID 3054866 for the generic Linux side of things.. The latter TID being more important. Then when an agent dies it will drop a core file which can be analyzed.

After the server is rigged, you can test to see if cores are being taken ( after you rig your server ) very easily.

1, Open up a SSH session to the server, open vi.
2. Open another SSH session, be root, and in it...
3. ps -ef | grep vi to find the PID of your vi, e.g. 31534 or whatever
4. kill -SIGABRT the_pid_of_vi, e.g kill -SIGABRT 31534

If you did everything correctly, the expected outcome is that the other SSH connection with vi will close, and you will have a shiny new core in /usr/local/dumps. If so, you are ready, the next time the agent explodes, you'll have something for Novell.

Use the novell-getcore to package it up.

-- Bob
0 Likes
mschlawin Frequent Contributor.
Frequent Contributor.

Re: PO Crashes - glibc double free or corruption

Thanks for the reply. It's been running since 1:00 yesterday, so I thought I had it, but it just crashed again a few minutes ago.

I see the PID of my PO as 31247. If I go to the \proc\31247 directory, I do NOT see a limits file. Here is the results of "l /proc/31247"

total 0
dr-xr-xr-x 5 root root 0 Feb 15 08:16 ./
dr-xr-xr-x 257 root root 0 Feb 14 05:39 ../
dr-xr-xr-x 2 root root 0 Feb 15 08:27 attr/
-r-------- 1 root root 0 Feb 15 08:27 auxv
-r--r--r-- 1 root root 0 Feb 15 08:16 cmdline
-r--r--r-- 1 root root 0 Feb 15 08:27 cpuset
lrwxrwxrwx 1 root root 0 Feb 15 08:27 cwd -> /opt/novell/groupwise/agents/bin/
-r-------- 1 root root 0 Feb 15 08:27 environ
lrwxrwxrwx 1 root root 0 Feb 15 08:16 exe -> /opt/novell/groupwise/agents/bin/gwpoa*
dr-x------ 2 root root 0 Feb 15 08:16 fd/
-rw-r--r-- 1 root root 0 Feb 15 08:27 loginuid
-rw------- 1 root root 0 Feb 15 08:27 mapped_base
-r--r--r-- 1 root root 0 Feb 15 08:27 maps
-rw------- 1 root root 0 Feb 15 08:27 mem
-r--r--r-- 1 root root 0 Feb 15 08:17 mounts
-rw-r--r-- 1 root root 0 Feb 15 08:27 oom_adj
-r--r--r-- 1 root root 0 Feb 15 08:27 oom_score
lrwxrwxrwx 1 root root 0 Feb 15 08:27 root -> //
-rw------- 1 root root 0 Feb 15 08:27 seccomp
-r--r--r-- 1 root root 0 Feb 15 08:27 smaps
-r--r--r-- 1 root root 0 Feb 15 08:16 stat
-r--r--r-- 1 root root 0 Feb 15 08:16 statm
-r--r--r-- 1 root root 0 Feb 15 08:16 status
dr-xr-xr-x 82 root root 0 Feb 15 08:27 task/
-r--r--r-- 1 root root 0 Feb 15 08:27 wchan

Could that be part of my problem? BTW, I am running GW in the "show" mode so I can see what is happening. I did not issue a "rcgrpwise start", but instead ran the MTA and PO by issuing the "gwpoa --show @<config>.poa &" command. Did bypassing the grpwise startup file cause a problem?

Matt



>>> On 2/14/2011 at 8:36 PM, in message <Bob-O-Rama.4p6iy0@no-mx.forums.novell.com>, Bob-O-Rama<Bob-O-Rama@no-mx.forums.novell.com> wrote:



Don't panic. First off, ulimit can hurt you too. Use *ps -ef | grep
gwpoa* to find your POA's process ID.


Code:
--------------------
gwuber:~ # ps -ef | grep gwpoa
root 21313 1 2 Jan22 ? 11:46:04 /opt/novell/groupwise/agents/bin/gwpoa @gws3_po.poa
root 21355 1 2 Jan22 ? 12:39:03 /opt/novell/groupwise/agents/bin/gwpoa @gws2_po.poa
root 21396 1 2 Jan22 ? 13:50:31 /opt/novell/groupwise/agents/bin/gwpoa @gws1_po.poa
root _28344_ 1 11 Jan25 ? 2-06:43:27 /opt/novell/groupwise/agents/bin/gwpoa @staff.poa

--------------------


Then, from the example above, you can see the ACTUAL limits on the
given process:


Code:
--------------------
gwuber:~ # cat /proc/_28344_/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set 21409488896 unlimited bytes
Max processes 192094 192094 processes
Max open files 200000 200000 files
Max locked memory unlimited unlimited bytes
Max address space 21874114560 unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 192094 192094 signals
Max msgqueue size 8192000 8192000 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

--------------------


-So prior to tuning, see what is actually happening.- You might find
you are actually restricting the limits. Now to the gasping linux
sysadmins, obviously setting limits to "unlimited" is a bad practice, as
these limits are designed to keep rogue processes or users from killing
the system. But.... i never said i was a good linux admin - I'm just a
guy who wants GroupWise to work.

- - -

In the absence of ulimit commands in your GroupWise startup file,
/etc/security/limits.conf can be used to set the limits. Here is the
tail end of ours:

* soft nofile 32769
* hard nofile 65536
* - core unlimited
* - msgqueue 8192000
* - memlock unlimited
# End of file

- - -

Rig the server to get cores when agents crash. See TID '3447847'
(http://www.novell.com/support/viewContent.do?externalId=3447847) and
'TID 3054866'
(http://www.novell.com/support/viewContent.do?externalId=3054866) for
the generic Linux side of things.. The latter TID being more important.
Then when an agent dies it will drop a core file which can be analyzed.


After the server is rigged, you can test to see if cores are being
taken ( after you rig your server ) very easily.

1, Open up a SSH session to the server, open vi.
2. Open another SSH session, be root, and in it...
3. ps -ef | grep vi to find the PID of your vi, e.g. 31534 or
whatever
4. kill -SIGABRT the_pid_of_vi, e.g kill -SIGABRT 31534

If you did everything correctly, the expected outcome is that the other
SSH connection with vi will close, and you will have a shiny new core in
/usr/local/dumps. If so, you are ready, the next time the agent
explodes, you'll have something for Novell.

Use the novell-getcore to package it up.

-- Bob


--
Bob Mahar -- Novell Knowledge Partner
Do you do what you do at a .EDU? http://novell.com/ttp
"Programming is like teaching a jellyfish to build a house."
More Bob: 'Twitter' (http://twitter.com/BobMahar) 'Blog'
(http://blog.trafficshaper.com) 'Vimeo' (http://vimeo.com/boborama) <--
Click And Be Amazed!
------------------------------------------------------------------------
Bob-O-Rama's Profile: http://forums.novell.com/member.php?userid=5269
View this thread: http://forums.novell.com/showthread.php?t=432440
0 Likes
Knowledge Partner
Knowledge Partner

Re: PO Crashes - glibc double free or corruption

mschlawin;2075186 wrote:
Hello,
I'm running GW 8 sp2 on OES Linux SLES 10 / OES 2 This system has been VERY stable and has been running just fine for the last several years. I have not made any changes to the system recently, other than the patch channel patches. This GW server has a GWIA, an MTA, and two POs on it.

One of the POs has now crashed four times in the last three days (twice this morning) The error is:

fvlgw:/media/nss/GWVOL/gwlogs/teachpo # *** glibc detected *** ./gwpoa: double free or corruption (fasttop): 0x087a60e8 ***

When I restart the post office, I get a DA03 error mailed to me.

I have searched the forums and found a suggestion to add the line ulimit -n 2048 to the /etc/init.d/grpwise startup file. I did that on Saturday and have had three crashes since.

Anyone have any idea what I can do next. I'm starting to panic!

Matt


Hi Matt,

You mention running 8sp2, there is however a later patch version 8sp2HP2 available which does fix crashes. I'd suggest to apply the 802hp2 before doing anything else.

Hope that helps,
Willem
0 Likes
Bob-O-Rama
Visitor.

Re: PO Crashes - glibc double free or corruption

You are running SLES 10. So there might not be one. It is not missing. Newer versions of Linux espose more and more stuff via /proc - which is used primarily to expose kernel level information / settings.

BTW, I don't think your issue is the limits. Its memory corruption.

Be sure you are running the latest code ( GW 8.0.2 HP1 at least and now HP2 released a couple weeks ago ).

If the agents continue to crash with the latest code, you may need to get dump file(s) to Novell for analysis. I like setting this up ahead of time so that I get meaningful info the first time and agent crashes.

-- Bob
0 Likes
mschlawin Frequent Contributor.
Frequent Contributor.

Re: PO Crashes - glibc double free or corruption

Thanks. I have this issue posted on the NGWList as well and they suggested that it might be fixed in HP2 as well.

I'm going to try to duplicate the error after school today and will then apply HP2.

Thanks again for your help!

Matt


>>> On 2/15/2011 at 12:36 PM, in message <Bob-O-Rama.4p7re2@no-mx.forums.novell.com>, Bob-O-Rama<Bob-O-Rama@no-mx.forums.novell.com> wrote:



You are running SLES 10. So there might not be one. *It is not
missing.* Newer versions of Linux espose more and more stuff via /proc
- which is used primarily to expose kernel level information /
settings.

BTW, I don't think your issue is the limits. Its memory corruption.

Be sure you are running the latest code ( GW 8.0.2 HP1 at least and now
HP2 released a couple weeks ago ).

If the agents continue to crash with the -latest code-, you may need to
get dump file(s) to Novell for analysis. I like setting this up ahead
of time so that I get meaningful info -the first time- and agent
crashes.

-- Bob


--
Bob Mahar -- Novell Knowledge Partner
Do you do what you do at a .EDU? http://novell.com/ttp
"Programming is like teaching a jellyfish to build a house."
More Bob: 'Twitter' (http://twitter.com/BobMahar) 'Blog'
(http://blog.trafficshaper.com) 'Vimeo' (http://vimeo.com/boborama) <--
Click And Be Amazed!
------------------------------------------------------------------------
Bob-O-Rama's Profile: http://forums.novell.com/member.php?userid=5269
View this thread: http://forums.novell.com/showthread.php?t=432440
0 Likes
Highlighted
mschlawin Frequent Contributor.
Frequent Contributor.

Re: PO Crashes - glibc double free or corruption

So far I am optimistic that the problem is solved. Since applying HP2 I have not had any crashes.

Thanks for all your help!

Matt


>>> On 2/15/2011 at 1:04 PM, in message <4D5A79CE.EB76.001E.1@FVLHS.ORG>, Matt Schlawin<MSchlawin@FVLHS.ORG> wrote:


Thanks. I have this issue posted on the NGWList as well and they suggested that it might be fixed in HP2 as well.

I'm going to try to duplicate the error after school today and will then apply HP2.

Thanks again for your help!

Matt


>>> On 2/15/2011 at 12:36 PM, in message <Bob-O-Rama.4p7re2@no-mx.forums.novell.com>, Bob-O-Rama<Bob-O-Rama@no-mx.forums.novell.com> wrote:



You are running SLES 10. So there might not be one. *It is not
missing.* Newer versions of Linux espose more and more stuff via /proc
- which is used primarily to expose kernel level information /
settings.

BTW, I don't think your issue is the limits. Its memory corruption.

Be sure you are running the latest code ( GW 8.0.2 HP1 at least and now
HP2 released a couple weeks ago ).

If the agents continue to crash with the -latest code-, you may need to
get dump file(s) to Novell for analysis. I like setting this up ahead
of time so that I get meaningful info -the first time- and agent
crashes.

-- Bob


--
Bob Mahar -- Novell Knowledge Partner
Do you do what you do at a .EDU? http://novell.com/ttp
"Programming is like teaching a jellyfish to build a house."
More Bob: 'Twitter' (http://twitter.com/BobMahar) 'Blog'
(http://blog.trafficshaper.com) 'Vimeo' (http://vimeo.com/boborama) <--
Click And Be Amazed!
------------------------------------------------------------------------
Bob-O-Rama's Profile: http://forums.novell.com/member.php?userid=5269
View this thread: http://forums.novell.com/showthread.php?t=432440
0 Likes
fzratt Absent Member.
Absent Member.

Re: PO Crashes - glibc double free or corruption

Did you actually apply both HP1 and HP2 or just HP2? Having the same issue and could use advice on whether both or just the latest need to be applied. Thanks.
0 Likes
Knowledge Partner
Knowledge Partner

Re: PO Crashes - glibc double free or corruption

fzratt;2142685 wrote:
Did you actually apply both HP1 and HP2 or just HP2? Having the same issue and could use advice on whether both or just the latest need to be applied. Thanks.


You only need to apply the latest GroupWise patch.... which at this moment is 802hp3, you can find it here (Linux version): NOVELL: Downloads - GroupWise 8.0 SP2 HP3 Linux Full EN and MULTI

-Willem
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.