Anonymous_User Absent Member.
Absent Member.
4147 views

OES SP2 namcd leaking FD's -> ldap_initconn fails?

Hi,

it seems my newly installed server experiences the ldap_initconn problem
mentioned already a few times here. From what I can see in
/var/log/messages and from testing with netstat and lsof, it seems that
namcd leaks a file descriptor every few minutes and then seemingly locks
up the server. "Seemingly" cause I could still login via SSH/publickey
as root, do a "namconfig cache_refresh" and everything went back to normal.

Now in /var/log/messages there was this:

May 28 21:56:03 oesi1 kernel: open files rlimit 1024 reached for uid 0
pid 16828
May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: pam_ldap_init():
ldapssl_add_trusted_cert() failed
May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: namGetLDAPHandle failed to
get LDAP handle, error 1.
May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: nss_ldap_init: Unable to
get LDAP handle.
May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: ldap_initconn: Error in
LDAP init for preferred server, rc = 2.
May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: ldap_initconn: LDAP bind
failed, trying to connect to alternative LDAP server
May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: ldap_initconn: Unable to
bind to alternative LDAP servers either.


lsof and netstatt told me this:

oesi1:~ # date && lsof -n -p 15859|tail -2
Tue May 30 11:49:51 CEST 2006
namcd 15859 root 174u IPv4 4843372 TCP
10.0.0.23:17092->10.0.0.23:ldaps (ESTABLISHED)
namcd 15859 root 175u IPv4 4846238 TCP
10.0.0.23:17327->10.0.0.23:ldaps (ESTABLISHED)

oesi1:~ # date && lsof -n -p 15859|tail -2
Tue May 30 11:53:34 CEST 2006
namcd 15859 root 175u IPv4 4846238 TCP
10.0.0.23:17327->10.0.0.23:ldaps (ESTABLISHED)
namcd 15859 root 176u IPv4 4849289 TCP
10.0.0.23:17632->10.0.0.23:ldaps (ESTABLISHED)

oesi1:~ # netstat -topn|egrep "(17632|17327|17092)"
tcp 0 0 10.0.0.23:17632 10.0.0.23:636
ESTABLISHED 15859/namcd off (0.00/0/0)
tcp 0 0 10.0.0.23:17092 10.0.0.23:636
ESTABLISHED 15859/namcd off (0.00/0/0)
tcp 0 0 10.0.0.23:17327 10.0.0.23:636
ESTABLISHED 15859/namcd off (0.00/0/0)
tcp 0 0 10.0.0.23:636 10.0.0.23:17327
ESTABLISHED 10112/ndsd keepalive (6286.86/0/0)
tcp 0 0 10.0.0.23:636 10.0.0.23:17092
ESTABLISHED 10112/ndsd keepalive (5986.34/0/0)
tcp 0 0 10.0.0.23:636 10.0.0.23:17632
ESTABLISHED 10112/ndsd keepalive (6587.36/0/0)

As can be seen from the TCP timeouts in netstat, every 300 seconds a new
connection is opened. How can I find out what triggers these connections
(strace doesn't seem to work as expected)? They happen even if the whole
network is shutdown and only one other Linux-only server is active.

bye,
Franz.
Labels (2)
0 Likes
4 Replies
Anonymous_User Absent Member.
Absent Member.

Re: OES SP2 namcd leaking FD's -> ldap_initconn fails?

Hey Franz,

My guess is that you have namcd configured to refresh the cache every
300 seconds... The config file for namcd is /etc/nam.conf and the option
for refreshing the cache is persistent-cache-refresh-period=
if you are not using persistent cache then namcd will make a request to
your LDAP server whenever an authentication request is made... As for
your problem, I would suggest disabling persistent cache but this will
slow down your authentication quite significantly...

-scz

Franz Sirl wrote:
> Hi,
>
> it seems my newly installed server experiences the ldap_initconn problem
> mentioned already a few times here. From what I can see in
> /var/log/messages and from testing with netstat and lsof, it seems that
> namcd leaks a file descriptor every few minutes and then seemingly locks
> up the server. "Seemingly" cause I could still login via SSH/publickey
> as root, do a "namconfig cache_refresh" and everything went back to normal.
>
> Now in /var/log/messages there was this:
>
> May 28 21:56:03 oesi1 kernel: open files rlimit 1024 reached for uid 0
> pid 16828
> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: pam_ldap_init():
> ldapssl_add_trusted_cert() failed
> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: namGetLDAPHandle failed to
> get LDAP handle, error 1.
> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: nss_ldap_init: Unable to
> get LDAP handle.
> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: ldap_initconn: Error in
> LDAP init for preferred server, rc = 2.
> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: ldap_initconn: LDAP bind
> failed, trying to connect to alternative LDAP server
> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: ldap_initconn: Unable to
> bind to alternative LDAP servers either.
>
>
> lsof and netstatt told me this:
>
> oesi1:~ # date && lsof -n -p 15859|tail -2
> Tue May 30 11:49:51 CEST 2006
> namcd 15859 root 174u IPv4 4843372 TCP
> 10.0.0.23:17092->10.0.0.23:ldaps (ESTABLISHED)
> namcd 15859 root 175u IPv4 4846238 TCP
> 10.0.0.23:17327->10.0.0.23:ldaps (ESTABLISHED)
>
> oesi1:~ # date && lsof -n -p 15859|tail -2
> Tue May 30 11:53:34 CEST 2006
> namcd 15859 root 175u IPv4 4846238 TCP
> 10.0.0.23:17327->10.0.0.23:ldaps (ESTABLISHED)
> namcd 15859 root 176u IPv4 4849289 TCP
> 10.0.0.23:17632->10.0.0.23:ldaps (ESTABLISHED)
>
> oesi1:~ # netstat -topn|egrep "(17632|17327|17092)"
> tcp 0 0 10.0.0.23:17632 10.0.0.23:636 ESTABLISHED
> 15859/namcd off (0.00/0/0)
> tcp 0 0 10.0.0.23:17092 10.0.0.23:636 ESTABLISHED
> 15859/namcd off (0.00/0/0)
> tcp 0 0 10.0.0.23:17327 10.0.0.23:636 ESTABLISHED
> 15859/namcd off (0.00/0/0)
> tcp 0 0 10.0.0.23:636 10.0.0.23:17327 ESTABLISHED
> 10112/ndsd keepalive (6286.86/0/0)
> tcp 0 0 10.0.0.23:636 10.0.0.23:17092 ESTABLISHED
> 10112/ndsd keepalive (5986.34/0/0)
> tcp 0 0 10.0.0.23:636 10.0.0.23:17632 ESTABLISHED
> 10112/ndsd keepalive (6587.36/0/0)
>
> As can be seen from the TCP timeouts in netstat, every 300 seconds a new
> connection is opened. How can I find out what triggers these connections
> (strace doesn't seem to work as expected)? They happen even if the whole
> network is shutdown and only one other Linux-only server is active.
>
> bye,
> Franz.

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: OES SP2 namcd leaking FD's -> ldap_initconn fails?

Thanks Scott,

that was it. I forgot about changing that setting :-(. Nevertheless this
is a bug I guess, cause even with the default setting it will fail
before one year of uptime.
But I'm happy now, that was the only real problem I had with this new
installation I did in preparation for a later migration of the whole
company.
Now I just have to find a HOWTO to create a rescue CD with updated
kernel and initrd, so I can use dd without OOM problems. It's a pity
that Novell doesn't put up updated CD1 ISOs on each kernel update.

Bye,
Franz.


Scott Zentz schrieb:
> Hey Franz,
>
> My guess is that you have namcd configured to refresh the cache
> every 300 seconds... The config file for namcd is /etc/nam.conf and the
> option for refreshing the cache is persistent-cache-refresh-period=
> if you are not using persistent cache then namcd will make a request to
> your LDAP server whenever an authentication request is made... As for
> your problem, I would suggest disabling persistent cache but this will
> slow down your authentication quite significantly...
>
> -scz
>
> Franz Sirl wrote:
>> Hi,
>>
>> it seems my newly installed server experiences the ldap_initconn
>> problem mentioned already a few times here. From what I can see in
>> /var/log/messages and from testing with netstat and lsof, it seems
>> that namcd leaks a file descriptor every few minutes and then
>> seemingly locks up the server. "Seemingly" cause I could still login
>> via SSH/publickey as root, do a "namconfig cache_refresh" and
>> everything went back to normal.
>>
>> Now in /var/log/messages there was this:
>>
>> May 28 21:56:03 oesi1 kernel: open files rlimit 1024 reached for uid 0
>> pid 16828
>> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: pam_ldap_init():
>> ldapssl_add_trusted_cert() failed
>> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: namGetLDAPHandle failed
>> to get LDAP handle, error 1.
>> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: nss_ldap_init: Unable to
>> get LDAP handle.
>> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: ldap_initconn: Error in
>> LDAP init for preferred server, rc = 2.
>> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: ldap_initconn: LDAP bind
>> failed, trying to connect to alternative LDAP server
>> May 28 21:56:03 oesi1 /usr/sbin/namcd[16821]: ldap_initconn: Unable to
>> bind to alternative LDAP servers either.
>>
>>
>> lsof and netstatt told me this:
>>
>> oesi1:~ # date && lsof -n -p 15859|tail -2
>> Tue May 30 11:49:51 CEST 2006
>> namcd 15859 root 174u IPv4 4843372 TCP
>> 10.0.0.23:17092->10.0.0.23:ldaps (ESTABLISHED)
>> namcd 15859 root 175u IPv4 4846238 TCP
>> 10.0.0.23:17327->10.0.0.23:ldaps (ESTABLISHED)
>>
>> oesi1:~ # date && lsof -n -p 15859|tail -2
>> Tue May 30 11:53:34 CEST 2006
>> namcd 15859 root 175u IPv4 4846238 TCP
>> 10.0.0.23:17327->10.0.0.23:ldaps (ESTABLISHED)
>> namcd 15859 root 176u IPv4 4849289 TCP
>> 10.0.0.23:17632->10.0.0.23:ldaps (ESTABLISHED)
>>
>> oesi1:~ # netstat -topn|egrep "(17632|17327|17092)"
>> tcp 0 0 10.0.0.23:17632 10.0.0.23:636 ESTABLISHED
>> 15859/namcd off (0.00/0/0)
>> tcp 0 0 10.0.0.23:17092 10.0.0.23:636 ESTABLISHED
>> 15859/namcd off (0.00/0/0)
>> tcp 0 0 10.0.0.23:17327 10.0.0.23:636 ESTABLISHED
>> 15859/namcd off (0.00/0/0)
>> tcp 0 0 10.0.0.23:636 10.0.0.23:17327
>> ESTABLISHED 10112/ndsd keepalive (6286.86/0/0)
>> tcp 0 0 10.0.0.23:636 10.0.0.23:17092
>> ESTABLISHED 10112/ndsd keepalive (5986.34/0/0)
>> tcp 0 0 10.0.0.23:636 10.0.0.23:17632
>> ESTABLISHED 10112/ndsd keepalive (6587.36/0/0)
>>
>> As can be seen from the TCP timeouts in netstat, every 300 seconds a
>> new connection is opened. How can I find out what triggers these
>> connections (strace doesn't seem to work as expected)? They happen
>> even if the whole network is shutdown and only one other Linux-only
>> server is active.
>>
>> bye,
>> Franz.

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: OES SP2 namcd leaking FD's -> ldap_initconn fails?


Hi there.

Just posting to explain my experience with the "open files rlimit
1024 reached for uid 0 pid xxxx" message. I couldn't find any process
with the PID given. When restarting the namcd service I could see that
open connections from namcd to the local LDAP grew up to 1024 in just a
minute. /etc/nam.conf was apparently correct. A namconfig cache_refresh
didn't solve the issue.

Finally my provider found that since the server didn't have an
eDirectory replica I should point namcd to a replica server in
/etc/nam.conf. Then check with namuserlist, for example, if it worked
ok.

Hope it helps in the future 🙂


--
dtascon
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: OES SP2 namcd leaking FD's -> ldap_initconn fails?

Thanks for the post!

I am having the exact same issue (except i can run about two weeks before I
hit the rlimit).My servers have a replica (3 deep off the root) on them,
unlike yours.

I had feeling my nam.conf was mis-configured, since the nam.conf has a base
cx of "o=org" and the server only has a R/W replica
of "ou=container1,ou=container2,o=org". running " lsof -p `pgrep namcd` " I
am seeing lots of ldap connections to servers with a r/w of o=org.

I've had an SR open for a while on this, but I'm getting the run around
changing server's to cache-only, enabling persistant caching, changing
cache-refresh timing. Nothing seems to help. This is annoying since the
servers are off site, and once namcd starts spouting the 'rlimit 1024'
message i can't ssh in to the server anymore.

Post back if this error crops up again, inspite of adding the replica's. I
like to know I'm not the only one having these crazy errors.
-Paul

dtascon wrote:

>
> Hi there.
>
> Just posting to explain my experience with the "open files rlimit
> 1024 reached for uid 0 pid xxxx" message. I couldn't find any process
> with the PID given. When restarting the namcd service I could see that
> open connections from namcd to the local LDAP grew up to 1024 in just a
> minute. /etc/nam.conf was apparently correct. A namconfig cache_refresh
> didn't solve the issue.
>
> Finally my provider found that since the server didn't have an
> eDirectory replica I should point namcd to a replica server in
> /etc/nam.conf. Then check with namuserlist, for example, if it worked
> ok.
>
> Hope it helps in the future 🙂
>
>


0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.