Welcome Serena Central users! CLICK HERE
The migration of the Serena Central community is currently underway. Be sure to read THIS MESSAGE to get your new login set up to access your account.
cfastner Absent Member.
Absent Member.
2406 views

OES update causes cluster to not mount?

I have two SUSE Linux servers sharing a cluster.

versions:
SUSE Linux Enterprise Server 10 SP2
Linux version 2.6.16.54-0.2.12-bigsmp (geeko@buildhost) OES 2 VERSION = 2 BUILD = FCS

I did a software update using YaST and after the update the cluster resources will not load.

It appears to me that I may have a problem with EVMS.
I have compared EVMS engine logs from when it worked to when it quit. They are identical until I get to this error:

_0_ Engine: engine_user_message: Message is: Engine: The plug-in Novell-NCS in module /lib/evms/2.5.5/ncs-1.0.0.so failed to load. The plug-in's setup_evms_plugin() function failed with error code 13: Permission denied.

Being relatively new to Linux I don't exactly know what I have to do to recover the Cluster. I could use ANY troubleshooting assistance! (I have also been very cautious about attempting anything for fear of corrupting the volume).

What steps can I take to determine where the actual problem is?

Thanks for any assistance you can provide,

Charlie
~~~
Labels (1)
0 Likes
4 Replies
changju Absent Member.
Absent Member.

Re: OES update causes cluster to not mount?

Error 13 means the process failed to open proc file "/proc/ncs/evms/engine.msg" (or "/proc/ncs/evms/daemon.msg").

You can do "lsof | grep evms" to see who has it opened. However, I suspect mismatch of the kernel modules causes the problem. Normally a "rcnovell-ncs restart" or server reboot will fix your problems.

Best regards,

Changju

cfastner;1877059 wrote:
I have two SUSE Linux servers sharing a cluster.

versions:
SUSE Linux Enterprise Server 10 SP2
Linux version 2.6.16.54-0.2.12-bigsmp (geeko@buildhost) OES 2 VERSION = 2 BUILD = FCS

I did a software update using YaST and after the update the cluster resources will not load.

It appears to me that I may have a problem with EVMS.
I have compared EVMS engine logs from when it worked to when it quit. They are identical until I get to this error:

_0_ Engine: engine_user_message: Message is: Engine: The plug-in Novell-NCS in module /lib/evms/2.5.5/ncs-1.0.0.so failed to load. The plug-in's setup_evms_plugin() function failed with error code 13: Permission denied.

Being relatively new to Linux I don't exactly know what I have to do to recover the Cluster. I could use ANY troubleshooting assistance! (I have also been very cautious about attempting anything for fear of corrupting the volume).

What steps can I take to determine where the actual problem is?

Thanks for any assistance you can provide,

Charlie
~~~
0 Likes
cfastner Absent Member.
Absent Member.

Re: OES update causes cluster to not mount?

I have rebooted this server several times and the cluster never comes back. I executed "rcnovell-ncs restart" with the following messages:

Starting Novell Cluster Services
Mounting adminfs at /admin ... already mounted done
done
senfs9:~ # Joining...
Now a member of
This node is not a member of a cluster

This MAY be a kernel mismatch but I don't know how to fix this. There is a menu.lst.old file that I think may have been created when I ran the updates. This includes several menu options for the older kernel. If I add this information to the menu will it load the old kernel (and get me back to where I WAS) or will this cause OTHER issues with boot up?

Thanks for the assistance!

Charlie
~~~
0 Likes
Highlighted
changju Absent Member.
Absent Member.

Re: OES update causes cluster to not mount?

Please try three simple things first.

1. Make sure NCS modules are loaded

Run command "lsmod | grep cma"

If you get something like this, they are loaded.

#15 root@CG_01:~ # lsmod | grep cma
cma 122140 1
crm 49696 1 cma
css 35480 4 cma,cmsg,crm,cvb
vipx 10904 4 cma,cmsg,crm,css
vll 43524 8 cma,cmsg,crm,cvb,css,vipx,sbd,gipc
clstrlib 692588 10 cma,cmsg,crm,cvb,css,vipx,sbd,gipc,vll,sbdlib
adminfs 39556 6 cma,cvb,sbd,gipc,clstrlib

2. Make sure SBD partition is present,

Run comman "sbdutil -v" and you should get something like this

#16 root@CG_01:~ # sbdutil -v

Cluster (SBD) partition on /dev/evms/.nodes/cgao_sp2_gmc3_cluster.sbd.

Signature # HeartBeat State eState Epoch SbdLock Bitmask
SBD* 0 00059328 LIVE 6 LOCK 0000001F
SBD* 1 00059101 LIVE 6 LOCK 0000001F
SBD* 2 00059017 LIVE 6 LOCK 0000001F
SBD* 3 00059007 LIVE 6 LOCK 0000001F
SBD* 4 00021147 LIVE 6 LOCK 0000001F
SBD* 5 00000001 0 UNLK 00000000
SBD* 6 00000001 0 UNLK 00000000
SBD* 7 00000001 0 UNLK 00000000
SBD* 8 00000001 0 UNLK 00000000
SBD* 9 00000001 0 UNLK 00000000
SBD* 10 00000001 0 UNLK 00000000
SBD* 11 00000001 0 UNLK 00000000
SBD* 12 00000001 0 UNLK 00000000
SBD* 13 00000001 0 UNLK 00000000
SBD* 14 00000001 0 UNLK 00000000
SBD* 15 00000001 0 UNLK 00000000
SBD* 16 00000001 0 UNLK 00000000
SBD* 17 00000001 0 UNLK 00000000
SBD* 18 00000001 0 UNLK 00000000
SBD* 19 00000001 0 UNLK 00000000
SBD* 20 00000001 0 UNLK 00000000
SBD* 21 00000001 0 UNLK 00000000
SBD* 22 00000001 0 UNLK 00000000
SBD* 23 00000001 0 UNLK 00000000
SBD* 24 00000001 0 UNLK 00000000
SBD* 25 00000001 0 UNLK 00000000
SBD* 26 00000001 0 UNLK 00000000
SBD* 27 00000001 0 UNLK 00000000
SBD* 28 00000001 0 UNLK 00000000
SBD* 29 00000001 0 UNLK 00000000
SBD* 30 00000001 0 UNLK 00000000
SBD* 31 00000001 0 UNLK 00000000


3. If 1 and 2 are good, make sure firewall is off, and you have the right IP addresses and net masks.

If the problems persist, please contact NTS.

Best regards,

Changju

cfastner;1877434 wrote:
I have rebooted this server several times and the cluster never comes back. I executed "rcnovell-ncs restart" with the following messages:

Starting Novell Cluster Services
Mounting adminfs at /admin ... already mounted done
done
senfs9:~ # Joining...
Now a member of
This node is not a member of a cluster

This MAY be a kernel mismatch but I don't know how to fix this. There is a menu.lst.old file that I think may have been created when I ran the updates. This includes several menu options for the older kernel. If I add this information to the menu will it load the old kernel (and get me back to where I WAS) or will this cause OTHER issues with boot up?

Thanks for the assistance!

Charlie
~~~
0 Likes
cfastner Absent Member.
Absent Member.

Re: OES update causes cluster to not mount?

I finally called Novell about this problem. I very friendly and competent Tech Support Engineer named Mark Russell assisted me. I found out that the ISSUE was a proprietary HP driver for our SAN that didn't play well with the Kernel update. Mark helped me make changes to make the server operational again and several days later Mark (with further assistance of another Novell Tech named Dell) assisted me to provide Multpath redundancy.

Thanks for all of the assistance online.

Charlie
~~~
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.