Welcome Serena Central users! CLICK HERE
The migration of the Serena Central community is currently underway. Be sure to read THIS MESSAGE to get your new login set up to access your account.
dbgallo1 Absent Member.
Absent Member.
2640 views

NCS NSS shares not able to connect, edir showing errors

Our situation here is we are running 4 OES 11SP2 (patched to January 2015) virtual servers (12GB RAM, 6 Procs each) on ESX 5.5. We only use this cluster for NSS shares with CIFS and AFP enabled. We have a random issue where the NSS shares on certain cluster members becoming unreachable. What we have observed is the following:

1. Cannot SSH to server, get password prompt, but never returns anything after that
2. eDir shows 625 errors for the server
3. cannot login via vsphere to server
4. No one can map drives to resources hosted on this node
5. nagios complains that all swap is being used
6. Eventually users cannot authenticate "ERROR: CODIR: treeLoginUser: Failed to connect to local DS Agent for user: USERNAME, context: 1908998149, error: -625"
7. Other CIFS errors - ERROR: AUTH: SEV maintenance: Retrieval of SEV list has failed for the user: DIFFERENT USER, context: 1908998146, error: -625
8. Then we get looping CIFS errors of "CRITICAL: AUTH: Failed to connect to eDirectory. ServerIP: 127.0.0.1,Error: -625" and "CRITICAL: AUTH: Error in connecting to Local DS Agent: -625"

These continue until we reboot the server via vSphere

Clustering doesn't migrate the volumes until the reboot, if you do manage to ssh in (very slow) you cannot migrate from command line, or iManager, the resources goes comatose.

any idea?

thanks!
Labels (2)
0 Likes
10 Replies
Knowledge Partner
Knowledge Partner

Re: NCS NSS shares not able to connect, edir showing errors

Am 03.04.2015 um 14:26 schrieb dbgallo:
>
> Our situation here is we are running 4 OES 11SP2 (patched to January
> 2015) virtual servers (12GB RAM, 6 Procs each) on ESX 5.5. We only use
> this cluster for NSS shares with CIFS and AFP enabled. We have a random
> issue where the NSS shares on certain cluster members becoming
> unreachable. What we have observed is the following:


Check /var/log/messages of the servers, especially check if the OOM
killer got invoked. This sounds like a memory/ressource leak.

CU,
--
Massimo Rosen
Novell Knowledge Partner
No emails please!
http://www.cfc-it.de
CU,
--
Massimo Rosen
Micro Focus Knowledge Partner
No emails please!
http://www.cfc-it.de
0 Likes
Highlighted
dbgallo1 Absent Member.
Absent Member.

Re: NCS NSS shares not able to connect, edir showing errors

The above mentioned CIFS messages and nagios messages about SWAP are the only clues in /var/log/messages. I searched the log for oom and cam up empty handed

mrosen;2351823 wrote:
Am 03.04.2015 um 14:26 schrieb dbgallo:
>
> Our situation here is we are running 4 OES 11SP2 (patched to January
> 2015) virtual servers (12GB RAM, 6 Procs each) on ESX 5.5. We only use
> this cluster for NSS shares with CIFS and AFP enabled. We have a random
> issue where the NSS shares on certain cluster members becoming
> unreachable. What we have observed is the following:


Check /var/log/messages of the servers, especially check if the OOM
killer got invoked. This sounds like a memory/ressource leak.

CU,
--
Massimo Rosen
Novell Knowledge Partner
No emails please!
http://www.cfc-it.de
0 Likes
Knowledge Partner
Knowledge Partner

Re: NCS NSS shares not able to connect, edir showing errors

On 06.04.2015 16:16, dbgallo wrote:
>
> The above mentioned CIFS messages and nagios messages about SWAP are the
> only clues in /var/log/messages. I searched the log for oom and cam up
> empty handed



Probably you just catch it "too" fast. At any rate, the swap eing eaten
up is a clear indicator of a memory leak somewhere. Now the quection
is... where. What are those servers running? Especially any third party
or non-OES native stuff?

CU,
--
Massimo Rosen
Novell Knowledge Partner
No emails please!
http://www.cfc-it.de
CU,
--
Massimo Rosen
Micro Focus Knowledge Partner
No emails please!
http://www.cfc-it.de
0 Likes
dbgallo1 Absent Member.
Absent Member.

Re: NCS NSS shares not able to connect, edir showing errors

Have 2hrs on either side of the reboot in some cases, the only thing that is 3rd party on these boxes is VMware tools and tivoli back up (TSM) . The OES software is afp, cifs, edir, nss, and iManager and the associated support structure for those services


mrosen;2351969 wrote:
On 06.04.2015 16:16, dbgallo wrote:
>
> The above mentioned CIFS messages and nagios messages about SWAP are the
> only clues in /var/log/messages. I searched the log for oom and cam up
> empty handed



Probably you just catch it "too" fast. At any rate, the swap eing eaten
up is a clear indicator of a memory leak somewhere. Now the quection
is... where. What are those servers running? Especially any third party
or non-OES native stuff?

CU,
--
Massimo Rosen
Novell Knowledge Partner
No emails please!
http://www.cfc-it.de
0 Likes
toblerone Absent Member.
Absent Member.

Re: NCS NSS shares not able to connect, edir showing errors

dbgallo wrote on Dienstag, 7. April 2015 17:16 in novell.support.open-
enterprise-server.linux.storage-and-backup :

>
> Have 2hrs on either side of the reboot in some cases, the only thing
> that is 3rd party on these boxes is VMware tools and tivoli back up
> (TSM) . The OES software is afp, cifs, edir, nss, and iManager and the
> associated support structure for those services
>


Check tivoli! Does it use deduplication on the Cluster-Node?
Check the dsm.sys. It is a good idea to use a equal dsm.sys for all nodes
but devide separat sections for each node and each resource.
Check the nss config. tivoli does not use SMS/TSA.

Bernd

0 Likes
dbgallo1 Absent Member.
Absent Member.

Re: NCS NSS shares not able to connect, edir showing errors

toblerone;2352535 wrote:
dbgallo wrote on Dienstag, 7. April 2015 17:16 in novell.support.open-
enterprise-server.linux.storage-and-backup :

>
> Have 2hrs on either side of the reboot in some cases, the only thing
> that is 3rd party on these boxes is VMware tools and tivoli back up
> (TSM) . The OES software is afp, cifs, edir, nss, and iManager and the
> associated support structure for those services
>


Check tivoli! Does it use deduplication on the Cluster-Node?
Check the dsm.sys. It is a good idea to use a equal dsm.sys for all nodes
but devide separat sections for each node and each resource.
Check the nss config. tivoli does not use SMS/TSA.

Bernd


That's how we are setup, this just started since the December patches
0 Likes
toblerone Absent Member.
Absent Member.

Re: NCS NSS shares not able to connect, edir showing errors

dbgallo wrote on Freitag, 10. April 2015 15:56 in novell.support.open-
enterprise-server.linux.storage-and-backup :

>
> toblerone;2352535 Wrote:
>> dbgallo wrote on Dienstag, 7. April 2015 17:16 in novell.support.open-
>> enterprise-server.linux.storage-and-backup :
>>
>> >
>> > Have 2hrs on either side of the reboot in some cases, the only thing
>> > that is 3rd party on these boxes is VMware tools and tivoli back up
>> > (TSM) . The OES software is afp, cifs, edir, nss, and iManager and

>> the
>> > associated support structure for those services
>> >

>>
>> Check tivoli! Does it use deduplication on the Cluster-Node?
>> Check the dsm.sys. It is a good idea to use a equal dsm.sys for all
>> nodes
>> but devide separat sections for each node and each resource.
>> Check the nss config. tivoli does not use SMS/TSA.
>>
>> Bernd

>
> That's how we are setup, this just started since the December patches
>
>

Then check the I/O waitstaits (with top). Maybe there ist a bottleneck.
Check if the newest vmware-tools are installed.

Then get sure that there is enougth performance to read and write to the
SAN. (Do you use RAW-Devices for the Cluster in ESC?)

Bernd
0 Likes
dbgallo1 Absent Member.
Absent Member.

Re: NCS NSS shares not able to connect, edir showing errors

Yes on the rdms, San disks are sata, data network is fiber. Issue is that when the server locks up, can't get login to pull any data, so we are limited to logs. We are going to patch this week to current and test from there
0 Likes
dbgallo1 Absent Member.
Absent Member.

Re: NCS NSS shares not able to connect, edir showing errors

Follow up, it looks like we have a situation where when the volume comes up, 42000 ncp connections are being established which overwhelms the server. 12 procs running at 90% , opened an SR with Novell
0 Likes
dbgallo1 Absent Member.
Absent Member.

Re: NCS NSS shares not able to connect, edir showing errors

No io errors, nothing jamming the host , yes using RDMs for the mappings
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.