Anonymous_User Absent Member.
Absent Member.
5049 views

NSS Pool Rebuild Problems

Running NWSB6.5 sp4 and had a Pool deactivate on me and was unable to remount. Ran NSS /PoolVerify and got various errors (same as TID 10082622 NSS Rebuild stops due to volume corruption)
http://www.novell.com/support/search.do?cmd=displayKC&docType=kc&externalId=10082622&sliceId=&dialogID=6183092&stateId=0%200%206185318

Updated to N65NSS4B.exe and re-ran NSS /PoolVerify, got the same errors

Here's the problem. When I run NSS /PoolRebuild I get 24.94494% and then CPU Util jumps to 100% and Logger shows "ValidateNode found problems in Znode" and it keeps scrolling off the screen. Also tried NSS /PoolRebuild /purge — same effect.

Any ideas?

Soroush

PS: Current NSS version 3.23.04 Sept. 26, 2005 (Build 1027 MP)

Labels (1)
0 Likes
10 Replies
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

Soroush Madjzoob wrote:

> Any ideas?


a) if it is not SYS that is borking on you, I would update to SP5
b) make sure you unload virus scan, java, etc - pretty much anything you can - before doing the scan

--
Joe Moore
Novell Support Forums SysOp
http://just.fdisk-it.com
http://www.caledonia.net/jmdns.html
http://www.caledonia.net/nesadmin.html
http://www.caledonia.net/jmttb.html
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

Joseph,

I hope you get this soon as I'm about to wipe and restore my Volumes but, my backup's a bit shakey!

Do you feel that the SP5 update would make a difference in finishing the Rebuild and recovering the NSS Pool?

I'm currently running on the patched version of NSS using N65NSS4b and it still does not work?

I've also tried NSS PoolRebuild on each copy of the mirror, independently, and they both give me the same result.



>>> Joseph Moore [SysOp]<joem@*spam*is*evil*fdisk-it.com> Wednesday, June 28, 2006 >>>

Soroush Madjzoob wrote:

> Any ideas?


a) if it is not SYS that is borking on you, I would update to SP5
b) make sure you unload virus scan, java, etc - pretty much anything you can - before doing the scan

--
Joe Moore
Novell Support Forums SysOp
http://just.fdisk-it.com
http://www.caledonia.net/jmdns.html
http://www.caledonia.net/nesadmin.html
http://www.caledonia.net/jmttb.html


0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

Soroush Madjzoob wrote:

>Do you feel that the SP5 update would make a difference in finishing the
>Rebuild and recovering the NSS Pool?


it may or may not - sorry I can't be more definite

--
Joe Moore
Novell Support Forums SysOp
http://just.fdisk-it.com
http://www.caledonia.net/jmdns.html
http://www.caledonia.net/nesadmin.html
http://www.caledonia.net/jmttb.html
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

Speaking of corruption!

I had a complete meltdown with our NSS Pools recently? Were they became corrupted and deactivated.
(Details 10082622 NSS Rebuild stops due to volume corruption)
http://www.novell.com/support/search.do?cmd=displayKC&docType=kc&externalId=10082622&sliceId=&dialogID=6183092&stateId=0%200%206185318

I think the corruption begain either due to me taking a snapshot of an NSS Pool or due to (backup remote agent) OFM.nlm from Veritas? The two pools reside on two separate servers and I only took a snapshot of one NSS Pool. I should not that once I applied the N65NSS4b.exe and ran the NSS PoolVerify/Rebuild, one Pool was recovered but was backdated to about Feb. of 2006 — about five months ago? The other Pool (the one with the exact errors as described in the TID above) I had to blow out and recreate and restore from backup?

Now I don't trust my NSS Pools to be healthy. So the question is:

1) Is there a safe way to check that NSS Pool is healthy w/o corruptions?
2) Are there known issues with taking MM Snap of a Pool w/nw6.5 sp4?
3) Are there known issues with backup agents OFM/OTM as described by the TID?
4) Is it possible that the mirrored partitions on each Pool showed synchronized but they weren't? And say, four months later when they did synchronize, it corrupted the Pools?


Thank you,

Soroush

>>> Joseph Moore [SysOp]<joem@*spam*is*evil*fdisk-it.com> Wednesday, June 28, 2006 >>>


Soroush Madjzoob wrote:

>Do you feel that the SP5 update would make a difference in finishing the
>Rebuild and recovering the NSS Pool?


it may or may not - sorry I can't be more definite

--
Joe Moore
Novell Support Forums SysOp
http://just.fdisk-it.com
http://www.caledonia.net/jmdns.html
http://www.caledonia.net/nesadmin.html
http://www.caledonia.net/jmttb.html





>>> Joseph Moore [SysOp]<joem@*spam*is*evil*fdisk-it.com> Wednesday, June 28, 2006 >>>


Soroush Madjzoob wrote:

>Do you feel that the SP5 update would make a difference in finishing the
>Rebuild and recovering the NSS Pool?


it may or may not - sorry I can't be more definite

--
Joe Moore
Novell Support Forums SysOp
http://just.fdisk-it.com
http://www.caledonia.net/jmdns.html
http://www.caledonia.net/nesadmin.html
http://www.caledonia.net/jmttb.html

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

On Sat, 01 Jul 2006 10:18:43 -0700, Soroush Madjzoob <Soroush@santech.net>
wrote:

See Andrew's response in the other thread.

/dps

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

On Sat, 01 Jul 2006 10:18:43 -0700, Soroush Madjzoob <Soroush@santech.net>
wrote:

Also,
> 4) Is it possible that the mirrored partitions on each Pool showed
> synchronized but they weren't? And say, four months later when they did
> synchronize, it corrupted the Pools?


I'd consider this unlikely, based on my experience with the mirroring
software (it's in Media Manager, BTW -- mm.nlm). I have seen a few
situatuions where the mirroring got stuck at 98% until coaxed, but never
have seen it claim to be complete when it wasn't, and haven't seen it
cause corruption.

/dps

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

Well, normally I'd agree but here's what I'm basing this on!

So again, I have two iSCSI boxes acting as SANs. They are being mirrored by the servers. For simplicity, let's say SAN A and B. I had to restart SAN A due to memory getting to be too low (1GB RAM) and the mirroring status NEVER read "mirror partitions are not synchronized" or "mirroring stopped" or "mirroring aborted" or anything. It read "Fully synchronized" while SAN A went down and came back up?

Now, when I downed and brought up SAN B one time, it quickly registered mirrory has stopped and then started again. Then, the next day the NSS pool got deactivated due to corruptions?

Before the corruption, we could not get into one of our folders DATA:Users it would take forever to list at times ... eventually, it all shutdown.

Thank you,

Soroush

>>> Dave Schneider<dave.schneider@emulex.com> Monday, July 03, 2006 >>>

On Sat, 01 Jul 2006 10:18:43 -0700, Soroush Madjzoob <Soroush@santech.net>
wrote:

Also,
> 4) Is it possible that the mirrored partitions on each Pool showed
> synchronized but they weren't? And say, four months later when they did
> synchronize, it corrupted the Pools?


I'd consider this unlikely, based on my experience with the mirroring
software (it's in Media Manager, BTW -- mm.nlm). I have seen a few
situatuions where the mirroring got stuck at 98% until coaxed, but never
have seen it claim to be complete when it wasn't, and haven't seen it
cause corruption.

/dps

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/


0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

On Mon, 03 Jul 2006 18:00:54 -0700, Soroush Madjzoob <Soroush@santech.net>
wrote:
[..]
> So again, I have two iSCSI boxes acting as SANs. They are being
> mirrored by the servers. For simplicity, let's say SAN A and B. I had
> to restart SAN A due to memory getting to be too low (1GB RAM) and the
> mirroring status NEVER read "mirror partitions are not synchronized" or
> "mirroring stopped" or "mirroring aborted" or anything. It read "Fully
> synchronized" while SAN A went down and came back up?


How long was SAN A down? Were any volumes mounted at the time? Did you
get any "device deactivated" messages while the box was down?

> Now, when I downed and brought up SAN B one time, it quickly registered
> mirrory has stopped and then started again. Then, the next day the NSS
> pool got deactivated due to corruptions?


Same questions...are the answers different?

If the driver never told MM that SAN A was down (sorry, I have trouble
with that name...to me the SAN is more than just one storage device, just
like the LAN is more than one NIC), then the mirroring software wouldn't
know it was gone until a write failed. If you had volumes mounted and
busy, you'd likely get an error within a few seconds. If you had volumes
mounted but idle, than you have approximately a 40 second window between
file system updates. If no volumes were mounted, then you have forever
without I/O.

Updating your driver is likely to be the big step.

/dps

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

SAN A was down for roughly 5 minutes; about the time it took to reboot (restart server) a pretty vanilla Netware box. However, it was during high activity, and had several volumes mounted including a Groupwise volume. Highly unlikely that there was little/no I/O at the time.

As far as I know, the .dsk/.ham drivers are updated. Plus, all I/O requests would be handed to the iSCSI layer since the physical disk is attached via iSCSI.

In either case ... no errors or warnnings while SAN A went down.

Soroush

>>> Dave Schneider<dave.schneider@emulex.com> Monday, July 03, 2006 >>>


On Mon, 03 Jul 2006 18:00:54 -0700, Soroush Madjzoob <Soroush@santech.net>
wrote:
[..]
> So again, I have two iSCSI boxes acting as SANs. They are being
> mirrored by the servers. For simplicity, let's say SAN A and B. I had
> to restart SAN A due to memory getting to be too low (1GB RAM) and the
> mirroring status NEVER read "mirror partitions are not synchronized" or
> "mirroring stopped" or "mirroring aborted" or anything. It read "Fully
> synchronized" while SAN A went down and came back up?


How long was SAN A down? Were any volumes mounted at the time? Did you
get any "device deactivated" messages while the box was down?

> Now, when I downed and brought up SAN B one time, it quickly registered
> mirrory has stopped and then started again. Then, the next day the NSS
> pool got deactivated due to corruptions?


Same questions...are the answers different?

If the driver never told MM that SAN A was down (sorry, I have trouble
with that name...to me the SAN is more than just one storage device, just
like the LAN is more than one NIC), then the mirroring software wouldn't
know it was gone until a write failed. If you had volumes mounted and
busy, you'd likely get an error within a few seconds. If you had volumes
mounted but idle, than you have approximately a 40 second window between
file system updates. If no volumes were mounted, then you have forever
without I/O.

Updating your driver is likely to be the big step.

/dps

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS Pool Rebuild Problems

On Tue, 04 Jul 2006 18:13:40 -0700, Soroush Madjzoob <Soroush@santech.net>
wrote:

> SAN A was down for roughly 5 minutes; about the time it took to reboot
> (restart server) a pretty vanilla Netware box. However, it was during
> high activity, and had several volumes mounted including a Groupwise
> volume. Highly unlikely that there was little/no I/O at the time.
>
> As far as I know, the .dsk/.ham drivers are updated.


Hope you don't have .dsk drivers anymore. The .ham's that would apply
would be on the iSCSI target box, other than iscsiham.ham.

> Plus, all I/O requests would be handed to the iSCSI layer since the
> physical disk is attached via iSCSI.
>
> In either case ... no errors or warnnings while SAN A went down.


Well, it sounds like the iSCSI driver flubbed up and didn't report that
SAN A was down. There should have been errors when reads or writes
failed, but I guess that didn't happen, and the mirror code thought all
was well. This sounds like a serious error in the driver line...although
the B57 driver might have been part or all of the problem, given its
reputation.

/dps

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.