Highlighted
mcclary
New Member.
1284 views

NCP volumes lost

SLES 11 SP4

We are randomly running into an issue where users can no long reach mounted volumes.

The only fix so far has been to totally restart the server, which is inconvenient and takes forever.
Any suggestions?


In the log file var/log/messages :
Jan 10 03:52:09 WGSD-APPS kernel: [1012252.612422] Out of memory: Kill process 3742 (ndsd) score 578 or sacrifice child
Jan 10 03:52:09 WGSD-APPS kernel: [1012252.612427] Killed process 3742 (ndsd) total-vm:5529296kB, anon-rss:3380388kB, file-rss:0kB
Jan 10 03:52:09 WGSD-APPS kernel: [1012472.527344] sfcbd invoked oom-killer: gfp_mask=0x106d0, nodemask=0, order=2, oom_adj=0, oom_score_adj=0
Jan 10 03:52:09 WGSD-APPS kernel: [1012472.527348] sfcbd cpuset=/ mems_allowed=0

ncp2nss.log:
[! 2018-01-10 03:52:08] IPCClient::ReceiveReply failed bytesReceived (-1) != replyHeader (12)
[! 2018-01-10 03:52:08] IPCServRequest open/send/received failed rc=107
[! 2018-01-10 03:52:08] ... ncp server ping FAILED rc=107
[! 2018-01-10 03:52:23] IPCClient::Open connect failed rc=111
[! 2018-01-10 03:52:23] IPCServRequest open/send/received failed rc=111
[! 2018-01-10 03:52:23] ... ncp server ping FAILED rc=111
[! 2018-01-10 03:52:38] IPCClient::Open connect failed rc=111
[! 2018-01-10 03:52:38] IPCServRequest open/send/received failed rc=111
[! 2018-01-10 03:52:38] ... ncp server ping FAILED rc=111

ncpserv.log:
[! 2018-01-10 03:48:03] SendBroadcastPing: sendto() conn:54 err:104, errmsg:Connection reset by peer
[! 2018-01-10 03:48:05] SendBroadcastPing: sendto() conn:87 err:32, errmsg:Broken pipe
[! 2018-01-10 03:48:05] SendBroadcastPing: sendto() conn:257 err:9, errmsg:Bad file descriptor
[W 2018-01-10 03:48:07] Killing connection 41
[W 2018-01-10 03:48:14] Killing connection 54
[W 2018-01-10 03:48:21] Killing connection 87

Thank you for your time,

Liz
Labels (2)
Tags (3)
0 Likes
8 Replies
Knowledge Partner
Knowledge Partner

Re: NSS volumes lost

Hi.

On 10.01.2018 18:14, mcclary wrote:
>
> SLES 11 SP4


And OES? What is your current patchlevel?

> The only fix so far has been to totally restart the server, which is
> inconvenient and takes forever.
> Any suggestions?


1. Patch your server.
2. You don't need to reboot. "rcndsd restart" should do it too, as
that's what crashes

>
> In the log file var/log/messages :
> Jan 10 03:52:09 WGSD-APPS kernel: [1012252.612422] Out of memory: Kill
> process 3742 (ndsd) score 578 or sacrifice child


You may also want to find out why your server runs out of memory. That
may or may not due to ndsd itself.

CU,
--
Massimo Rosen
Micro Focus Knowledge Partner
No emails please!
http://www.cfc-it.de
CU,
--
Massimo Rosen
Micro Focus Knowledge Partner
No emails please!
http://www.cfc-it.de
0 Likes
mcclary
New Member.

Re: NSS volumes lost

mrosen;2473169 wrote:
Hi.

On 10.01.2018 18:14, mcclary wrote:
>
> SLES 11 SP4


And OES? What is your current patchlevel?

> The only fix so far has been to totally restart the server, which is
> inconvenient and takes forever.
> Any suggestions?


1. Patch your server.
2. You don't need to reboot. "rcndsd restart" should do it too, as
that's what crashes

>
> In the log file var/log/messages :
> Jan 10 03:52:09 WGSD-APPS kernel: [1012252.612422] Out of memory: Kill
> process 3742 (ndsd) score 578 or sacrifice child


You may also want to find out why your server runs out of memory. That
may or may not due to ndsd itself.

CU,
--
Massimo Rosen
Micro Focus Knowledge Partner
No emails please!
http://www.cfc-it.de




Novell OES 2015
Version = 2015.1
PatchLevel = 1
SLES
version = 11.4
SUSE linux Enterprise Server = 11
Version = 11
PatchLevel = 4
0 Likes
Knowledge Partner Knowledge Partner
Knowledge Partner

Re: NSS volumes lost

we'd need the info a little more specific, such as "patched up to and including december 2017 scheduled maintenance". if the out-of-memory-killer kills ndsd, this condition must be caused by something. maybe by a leaking-bug in ndsd which might already be fixed. apart from that: how much memory does this box have? does it have any jobs other than serving files?
0 Likes
Knowledge Partner
Knowledge Partner

Re: NSS volumes lost

In article <mcclary.8aw69c@no-mx.forums.microfocus.com>, Mcclary wrote:
> Jan 10 03:52:09 WGSD-APPS kernel: [1012252.612422] Out of memory: Kill
> process 3742 (ndsd) score 578 or sacrifice child


so how is memory doing? Start with the free command
http://www.konecnyad.ca/andyk/freemem.htm
If you are regularly using more than half of swap, you need more RAM.

as you have nagios running by default, check http://ServerNameorIP/gweb
to see the memory graphs for the system and can see the longer term
graphs.


Andy of
http://KonecnyConsulting.ca in Toronto
Knowledge Partner
http://forums.novell.com/member.php/75037-konecnya
If you find a post helpful and are logged in the Web interface, please
show your appreciation by clicking on the star below. Thanks!

___
Andy of Konecny Consulting in Toronto
Knowledge Partner Profile
If you find a post helpful, click the Like button below. Thanks!
0 Likes
mcclary
New Member.

Re: NSS volumes lost

free -m
total used free shared buffers cached
Mem: 5981 5807 174 3 55 3215
-/+ buffers/cache: 2536 3445
Swap: 2047 148 1899


WGSD-APPS:~ # ndsd --version
NetIQ eDirectory 8.8 SP8 v20812.20


after restarting VM or ndsd after crash it starts consuming memory quickly. Yesterday evening when I restarted the ndsd service it grew to %mem 41.1 in about 4 hours.
0 Likes
Knowledge Partner
Knowledge Partner

Re: NCP volumes lost

In article <mcclary.8c0rgn@no-mx.forums.microfocus.com>, Mcclary wrote:
> after restarting VM or ndsd after crash it starts consuming memory
> quickly. Yesterday evening when I restarted the ndsd service it grew to
> %mem 41.1 in about 4 hours.


Was that free -m just after a restart or closer to a crash?

Is this the only box in the tree?

How big is your eDir?
du -hx --max-depth=1 /var/opt/novell/eDirectory/data/dib/

Number of other servers in the tree? About how many years old is the
tree?

Have you checked basic eDir health? Perhaps you have an issue at that
level that is sucking memory
ndsrepair -T
ndsrepair -E
ndsrepair -C -Ad -A


Andy of
http://KonecnyConsulting.ca in Toronto
Knowledge Partner
http://forums.novell.com/member.php/75037-konecnya
If you find a post helpful and are logged in the Web interface, please
show your appreciation by clicking on the star below. Thanks!

___
Andy of Konecny Consulting in Toronto
Knowledge Partner Profile
If you find a post helpful, click the Like button below. Thanks!
0 Likes
mcclary
New Member.

Re: NCP volumes lost

It was closer to a crash.
2 other servers
about 5 yr old

# du -h --max-depth=1 /var/opt/novell/eDirectory/data/dib/
4.0K /var/opt/novell/eDirectory/data/dib/crl.rfl
8.0K /var/opt/novell/eDirectory/data/dib/cert.rfl
24K /var/opt/novell/eDirectory/data/dib/certserv
6.9M /var/opt/novell/eDirectory/data/dib/NDO.rfl
6.9M /var/opt/novell/eDirectory/data/dib/nds.rfl
388M /var/opt/novell/eDirectory/data/dib/
0 Likes
Knowledge Partner
Knowledge Partner

Re: NCP volumes lost

In article <mcclary.8c2qnz@no-mx.forums.microfocus.com>, Mcclary wrote:
> 388M /var/opt/novell/eDirectory/data/dib/


How does this compare to the other servers?
What is/are the biggest file(s) in there? Anything over about 20MB?
How does that eDir health checks come out? Run on each of the servers
to make sure there are no errors and if there is then we need to fix
those.

When noting the amount of memory ndsd is using in top, it is useful to
note the amount under RES as well. If you start top after a reboot and
press M you can watch to see if/how much memory use grows for those top
Commands.



Andy of
http://KonecnyConsulting.ca in Toronto
Knowledge Partner
http://forums.novell.com/member.php/75037-konecnya
If you find a post helpful and are logged in the Web interface, please
show your appreciation by clicking on the star below. Thanks!

___
Andy of Konecny Consulting in Toronto
Knowledge Partner Profile
If you find a post helpful, click the Like button below. Thanks!
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.