Anonymous_User Absent Member.
Absent Member.
3050 views

NSS-2.70-5009 pool & NSS-2.70-5008 volume errors

Hi folks,

I have a Dell Poweredge 1750 with a Powervault 220S hung off the back side
of it running NetWare 6.5 SP2. The SYS volume is raid 1 and physically
located in the 1750. The DATA volume is raid 5 and physically located in
the Powervault 220S ( a total of 8 drives ).

On 12/24/05 18:00 the Brightstor ARCserve Backup v11.1 log shows a failure
to receive data from the client agent. ARCserve was near the end of it's
full backup of this volume ( the job is multi-plexed ). The server had
'multiple abends on processor 0' at the top of the screen with garbage text
filling the rest and it was in multiple colours, very pretty really. Not
your normal abend though. The server was not reachable in any way that I
could find and appeared to be locked ( remote console, at the keyboard,
ping, nothing worked ). 5 of the 8 drives in the Powervault were blinking
their orange LEDs, not good. I cold booted the whole thing, crossing my
fingers. The PERC controller reports 2 logical drives found, 1 failed. I
hit ctrl-alt-del because I missed the prompt to get into the PERC
controller. This time it reports 0 logical drives found. Call Dell, the
PERC controller has lost its configuration. The 5 drives that were failing
show 0 media errors and hundreds of other errors. The Dell rep thinks the
other errors were caused when the failure occurred ( at this point they
think a power supply went out, but that turns out to be incorrect ). I
Copied the config from the drives to NVRAM ( which was empty ), recreated
the DATA array and rebooted. Everything is ducky.

Except it's not. Since the failure I have received the follow pair of
errors 11 times. It seems to be related to the backup, though two sets of
the errors ( judging by the timestamps ) fall between sessions, though the
timing is within minutes.

12-30-2005 1:33:22 am: COMN-3.22-178
Severity = 5 Locus = 3 Class = 0
NSS-2.70-5009: Pool RKK61/DATA had an error
(20012(beastTree.c[506])) at block 22451084(file block -22451084)(ZID 1).

12-30-2005 1:33:22 am: COMN-3.22-180
Severity = 5 Locus = 3 Class = 0
NSS-2.70-5008: Volume RKK61/DATA had an error
(20012(beastTree.c[506])) at block 22451084(file block -22451084)(ZID 1).

How to proceed is the question, I've seen some stuff about verifying and
rebuilding the data pool though I am reluctant since some folks have
reported data loss.

Your thoughts are appreciated.

Good day.

Eric


Labels (2)
0 Likes
4 Replies
Anonymous_User Absent Member.
Absent Member.

Re: NSS-2.70-5009 pool & NSS-2.70-5008 volume errors

Hi,

( Further evidence that software RAID is better than hardware RAID and
that RAID 5 is basically a joke. )

After such a, as yet unexplained, catastrophy I would NOT have
confidence in the RAID array. Now if you were 100% confident in the
array, you'd upgrade to NW65SP4a and then

nss /poolrebuild=poolname

that will deactivate all of the volumes in that pool while the rebuild
is in progress.

Considering the guts were ripped out of the file system when the array
exploded, I'd imagine there IS corruption. However running a pool
rebuild using 2 year old NSS code is "not so schmart". So I'd be
inclined to apply SP4a first. Perhaps others would not.

-- Bob

- - - - - - - - - - - - - - - - -
Robert Charles Mahar
Traffic Shaping Engine for NetWare
http://www.TrafficShaper.com
- - - - - - - - - - - - - - - - -

*Eric S. Crawford wrote:
> Hi folks,
>
> I have a Dell Poweredge 1750 with a Powervault 220S hung off the back side
> of it running NetWare 6.5 SP2. The SYS volume is raid 1 and physically
> located in the 1750. The DATA volume is raid 5 and physically located in
> the Powervault 220S ( a total of 8 drives ).
>
> On 12/24/05 18:00 the Brightstor ARCserve Backup v11.1 log shows a failure
> to receive data from the client agent. ARCserve was near the end of it's
> full backup of this volume ( the job is multi-plexed ). The server had
> 'multiple abends on processor 0' at the top of the screen with garbage text
> filling the rest and it was in multiple colours, very pretty really. Not
> your normal abend though. The server was not reachable in any way that I
> could find and appeared to be locked ( remote console, at the keyboard,
> ping, nothing worked ). 5 of the 8 drives in the Powervault were blinking
> their orange LEDs, not good. I cold booted the whole thing, crossing my
> fingers. The PERC controller reports 2 logical drives found, 1 failed. I
> hit ctrl-alt-del because I missed the prompt to get into the PERC
> controller. This time it reports 0 logical drives found. Call Dell, the
> PERC controller has lost its configuration. The 5 drives that were failing
> show 0 media errors and hundreds of other errors. The Dell rep thinks the
> other errors were caused when the failure occurred ( at this point they
> think a power supply went out, but that turns out to be incorrect ). I
> Copied the config from the drives to NVRAM ( which was empty ), recreated
> the DATA array and rebooted. Everything is ducky.
>
> Except it's not. Since the failure I have received the follow pair of
> errors 11 times. It seems to be related to the backup, though two sets of
> the errors ( judging by the timestamps ) fall between sessions, though the
> timing is within minutes.
>
> 12-30-2005 1:33:22 am: COMN-3.22-178
> Severity = 5 Locus = 3 Class = 0
> NSS-2.70-5009: Pool RKK61/DATA had an error
> (20012(beastTree.c[506])) at block 22451084(file block -22451084)(ZID 1).
>
> 12-30-2005 1:33:22 am: COMN-3.22-180
> Severity = 5 Locus = 3 Class = 0
> NSS-2.70-5008: Volume RKK61/DATA had an error
> (20012(beastTree.c[506])) at block 22451084(file block -22451084)(ZID 1).
>
> How to proceed is the question, I've seen some stuff about verifying and
> rebuilding the data pool though I am reluctant since some folks have
> reported data loss.
>
> Your thoughts are appreciated.
>
> Good day.
>
> Eric
>
>

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS-2.70-5009 pool & NSS-2.70-5008 volume errors

*Eric S. Crawford wrote:

> I've seen some stuff about verifying and rebuilding the data pool
> though I am reluctant since some folks have reported data loss.


I would agree with Bob - you need to apply SP4A, try to get a good
backup (make sure you verify it) then cross your fingers and run an NSS
/poolrebuild=poolname

Normally, a poolrebuild is quite safe - but given that your hardware
took a serious dump, I would definitely recommend that you make VERY
sure your backup is good. Even if you have to run little backups of
each volume.

--
Joe Moore
Novell Support Forums SysOp
http://just.fdisk-it.com
http://www.caledonia.net/jmdns.html
http://www.caledonia.net/nesadmin.html
http://www.caledonia.net/jmttb.html
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS-2.70-5009 pool & NSS-2.70-5008 volume errors

Also check the firmware on the Perc, that sounds like the problems that
happened with 2.70

Cheers Dave


--

Dave Parkes [NSCS]
Occasionally resident at http://support-forums.novell.com/
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: NSS-2.70-5009 pool & NSS-2.70-5008 volume errors

Dave Parkes wrote:

> Also check the firmware on the Perc,


yup - that would be good 🙂

--
Joe Moore
Novell Support Forums SysOp
http://just.fdisk-it.com
http://www.caledonia.net/jmdns.html
http://www.caledonia.net/nesadmin.html
http://www.caledonia.net/jmttb.html
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.