changju Absent Member.
Absent Member.
1898 views

Anyone Willing to Test the Fix for NCS Instability Issue

Please your kernel version (“uname –a”) to me (cgao@novell.com). I will build a (proposed) fix for you.

Regards,

Changju
Labels (1)
0 Likes
10 Replies
changju Absent Member.
Absent Member.

Re: Anyone Willing to Test the Fix for NCS Instability Issue

Still no takers?

Here is what I found,

"CPU hotplug is busted (onlining of CPU1 kills the kernel)"

> Fix CPU hotplug breakage on HP nx6325 and similar boxes caused by a reference
> to disable_apic_timer (labeled as __initdata) from the CPU initialization
> code.

More at:
Re: 2.6.23-rc8-mm2: problems on HP nx6325


Please let me know if you are willing to test the fix for us.
0 Likes
Knowledge Partner
Knowledge Partner

Re: Anyone Willing to Test the Fix for NCS Instability Issue

I would, but we've not yet started migrating our NetWare cluster to OES2 yet. but if the patch isn't released by that time, I'd be willing to try it.
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Anyone Willing to Test the Fix for NCS Instability Issue

kjhurni wrote:

>
> I would, but we've not yet started migrating our NetWare cluster to OES2
> yet. but if the patch isn't released by that time, I'd be willing to
> try it.
>
>

I would be willing to test if it fixes slow cluster failovers. Otherwise am
not having an issue with clustering here

0 Likes
Bob-O-Rama
Visitor.

Re: Anyone Willing to Test the Fix for NCS Instability Issue

changju;1970542 wrote:
Still no takers?

Here is what I found,

"CPU hotplug is busted (onlining of CPU1 kills the kernel)"

> Fix CPU hotplug breakage on HP nx6325 and similar boxes caused by a reference
> to disable_apic_timer (labeled as __initdata) from the CPU initialization
> code.

More at:
Re: 2.6.23-rc8-mm2: problems on HP nx6325


Please let me know if you are willing to test the fix for us.


Can you explain the mechanism of action?

-- Bob
0 Likes
wmcrocker Absent Member.
Absent Member.

Re: Anyone Willing to Test the Fix for NCS Instability Issue

changju;1970542 wrote:
Still no takers?

Here is what I found,

"CPU hotplug is busted (onlining of CPU1 kills the kernel)"

> Fix CPU hotplug breakage on HP nx6325 and similar boxes caused by a reference
> to disable_apic_timer (labeled as __initdata) from the CPU initialization
> code.

More at:
Re: 2.6.23-rc8-mm2: problems on HP nx6325


Please let me know if you are willing to test the fix for us.


Hi

What do you need from us, just our kernal version?

Cheers

Wayne
0 Likes
bretttarr Absent Member.
Absent Member.

Re: Anyone Willing to Test the Fix for NCS Instability Issue

our kernel version is as follows

Linux typhoon 2.6.16.60-0.60.1-bigsmp #1 SMP Tue Mar 9 09:44:12 UTC 2010 i686 i686 i386 GNU/Linux
0 Likes
Bob-O-Rama
Visitor.

Re: Anyone Willing to Test the Fix for NCS Instability Issue

Bob-O-Rama;1971210 wrote:
Can you explain the mechanism of action?

-- Bob


Namely, how would this defect ever surface on production HW in the absence of an actual hot-plug of the CPU? You can trigger this issue manually if you enable / disable cores via:

echo 0 > /sys/devices/system/cpu/cpu1/online

for example to turn off CPU 1. And then turn it back on. Boom. But this would not cause "slow" cluster failover, but rather the node with the hotplug event would drop like a rock. Its not subtle.

So if you can explain the relationship between the two... perhaps I'd be more eager to spend the time to set this up in test.

-- Bob
0 Likes
Knowledge Partner
Knowledge Partner

Re: Anyone Willing to Test the Fix for NCS Instability Issue

Bob,

Bob-O-Rama wrote:
>
> for example to turn off CPU 1. And then turn it back on. Boom. But
> this would not cause "slow" cluster failover, but rather the node with
> the hotplug event would drop like a rock. Its not subtle.


The issue at hand here is not subtle or slowness of a failover. The
affected cluster nodes *do* drop like rocks.

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
CU,
--
Massimo Rosen
Micro Focus Knowledge Partner
No emails please!
http://www.cfc-it.de
0 Likes
Bob-O-Rama
Visitor.

Re: Anyone Willing to Test the Fix for NCS Instability Issue

mrosen;1973268 wrote:
Bob,

The issue at hand here is not subtle or slowness of a failover. The affected cluster nodes *do* drop like rocks.



Thats what I was getting at - in response to warper2 - though I think I plopped it under the wrong branch.

So is the issue some sort of dynamic power management hot-plugging the CPUs? Because I have a hard time understanding how this happens as a part of "normal" operations of OES or NCS.

I'm all in favor of patching things, but in this case, the cure should be related to the disease. And unless the CPUs are actually being hot-plugged, its hard to understand the relevance.

-- Bob
0 Likes
Knowledge Partner
Knowledge Partner

Re: Anyone Willing to Test the Fix for NCS Instability Issue

Bob,

Bob-O-Rama wrote:
>
> mrosen;1973268 Wrote:
> > Bob,
> >
> > The issue at hand here is not subtle or slowness of a failover. The
> > affected cluster nodes *do* drop like rocks.
> >
> >

>
> Thats what I was getting at - in response to warper2 - though I think I
> plopped it under the wrong branch.
>
> So is the issue some sort of dynamic power management hot-plugging the
> CPUs? Because I have a hard time understanding how this happens as a
> part of "normal" operations of OES or NCS.


I have *no* idea, but admittedly I wondered the same. Of course, modern
Xeon CPUs have very sophisticate dpower management techniques, but I'm
not aware that they dynamically drop indicidual cores altogether.

> I'm all in favor of patching things, but in this case, the cure should
> be related to the disease. And unless the CPUs are actually being
> hot-plugged, its hard to understand the relevance.


Agreed.

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
CU,
--
Massimo Rosen
Micro Focus Knowledge Partner
No emails please!
http://www.cfc-it.de
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.