sma2006 Outstanding Contributor.
Outstanding Contributor.
4325 views

OES2 Linux Cluster - slow resource migration

I have migrated few Netware 6.5 sp8 clusters to OES2Sp2 linux clusters during this year and the average time to migrate a pool resource seems to be higher on Linux.

We used to migrate Netware pools in less than 5 seconds and now the minimum time is 20-30 seconds (even more) with OES2 SP2 linux .

We found this problem with few different hardware types, physical and virtual and different volume sizes.

Does anyone can confirm this situation or have found tips to increase the migration speed ?

Thanks

Sylvain
Labels (1)
0 Likes
15 Replies
Micro Focus Contributor
Micro Focus Contributor

Re: OES2 Linux Cluster - slow resource migration

My resources on oes2sp2 linux migrate in 5-10 seconds. I don't know if
this is common, but it seems you might be able to speed it up. You
could tail the unload.out and load.out files on the source and target
nodes during migration of the resource to see which commands are taking
the longest to run. This might help you find a way to speed it up. The
load.out and unload.out files are found in the /var/opt/novell/log/ncs
directory.
0 Likes
Knowledge Partner
Knowledge Partner

Re: OES2 Linux Cluster - slow resource migration

Are you verifying the time via iManager results? I ask because iManager defaults to refresh every 30 seconds.

You can also issue this on the source/target servers:

tail -f /var/log/messages

and you can get a real time feel for how long it's really taking to load/unload the resources.

I've found that in our environment, groupwise loads/unloads much quicker on OES2 Linux.
However, i've had mixed results with the NSS Volumes, although I never did extensive testing in either environment. My GUESS is that with our NSS volumes it may be due to the time of day and how many open files were on the system.
0 Likes
Micro Focus Contributor
Micro Focus Contributor

Re: OES2 Linux Cluster - slow resource migration

"watch -n1 cluster status" is a great way to watch the resource status.
That will refresh every one second.
0 Likes
skapanen2 Absent Member.
Absent Member.

Re: OES2 Linux Cluster - slow resource migration

On 21.12.2010 20:06, kjhurni wrote:
>
> I've found that in our environment, groupwise loads/unloads much
> quicker on OES2 Linux.


Same here, Groupwise resources migrates faster than they did on Netware.

but yes, sometimes it takes a long time to migrate NSS resources,
unloading the volume takes the time..

-sk

HAMK University - OES, NW, GW, NCS, eDir, Zen, IDM, NSL - www.hamk.fi
0 Likes
sma2006 Outstanding Contributor.
Outstanding Contributor.

Re: OES2 Linux Cluster - slow resource migration

Hello,

Thanks for the informations.


Here is the message log and .out files from the source and destination cluster nodes for the ADM pool. ( 300 GB ).

It seems there is a delay (about 30sec) between the time when the resource is unloaded and when it starts reloading .

any idea ???

Thanks


RESOURCE UNLOAD ON SPCSL03 :

UNLOAD.OUT

CRM: Wed Dec 22 15:14:07 2010
++ NCSVAR_IP_ADDR=/etc/ha.d/resource.d/IPaddr2
++ NCSVAR_FILE_SYSTEM=/etc/ha.d/resource.d/Filesystem
++ NCSVAR_OCF_DIR=/usr/lib/ocf/resource.d/heartbeat
++ PATH=/sbin:/usr/sbin:/usr/local/sbin:/opt/gnome/sbin:/root/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/novell/eDirectory/bin:/opt/novell/eDirectory/sbin:/opt/nsr:/opt/novell/migration/sbin:/opt/novell/ncl/bin:/opt/novell/sms/bin:/etc/opt/emcpower/bin:/opt/nsr:/opt/novell/afptcpd/bin/:/opt/novell/bin
+ ignore_error ncpcon unbind --ncpservername=ADM_SERVER --ipaddress=10.78.87.23
+ ncpcon unbind --ncpservername=ADM_SERVER --ipaddress=10.78.87.23
... Executing " unbind"

... completed OK [elapsed time = 102 msecs 182 usecs]
+ date
Wed Dec 22 15:14:07 CET 2010
+ return 0
+ ignore_error del_secondary_ipaddress 10.78.87.23
+ del_secondary_ipaddress 10.78.87.23
+ local ip=10.78.87.23
+ shift
+ '[' -n '' ']'
+ local other_options=
++ expr match '' '.*\(\<dev\s\+\w\+\>\).*'
+ local dev_name=
+ '[' -z '' ']'
+++ expr match 10.78.87.23 '\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*'
++ grep -o '^[0-9]\+:[[:space:]]\+[[:alnum:]]\+:'
++ ip addr show to 10.78.87.23
+ dev_name='6: bond0:'
++ expr match '6: bond0:' '[0-9]\+:\s\+\(\w\+\):'
+ dev_name=bond0
+ '[' -n bond0 ']'
+ dev_name='dev bond0'
++ expr match 10.78.87.23 '.*\(\/[0-9]\+\)'
+ '[' -z '' ']'
++ ip addr show dev bond0
++ grep -m1 -o '^[[:space:]]*inet[[:space:]]\+[^[:space:]]\+'
++ grep -o '\/[0-9]\+'
+ local mask_width=/17
+ ip=10.78.87.23/17
+ ip -f inet addr del 10.78.87.23/17 dev bond0
+ date
Wed Dec 22 15:14:07 CET 2010
+ return 0
+ ignore_error nss /pooldeact=ADMPOOL
+ nss /pooldeact=ADMPOOL
+ date
Wed Dec 22 15:14:10 CET 2010
+ return 0
+ exit 0



MESSAGE LOG


Dec 22 15:14:07 spcsl03 adminus daemon: umounting volume ADM lazy=1
Dec 22 15:14:09 spcsl03 httpstkd[6934]: DATE=20101222141409 GMT HOST=spcsl03 PROG=HTTPSTK LVL=Usage SRC.IP="10.78.59.3" SRC.PORT=1245 PROT="HTTP" MSG="error code 401" STAT="GET" CMD="/NWHealth/UPDATE/CPUUPDATE" DUR=0
Dec 22 15:14:10 spcsl03 kernel: NSSLOG ==> [MSAP] comnLog.c[201]
Dec 22 15:14:10 spcsl03 kernel: Pool "ADMPOOL" - MSAP deactivate.





RESSOURCE LOAD ON SPCSL03


LOAD.OUT

Wed Dec 22 15:14:41 CET 2010
+ '[' '!' 0 -eq 0 ']'
+ return 0
+ exit_on_error ncpcon mount ADM=252
+ ncpcon mount ADM=252
... Executing " mount ADM=252"


The following volume were mounted:
ADM ID:252
1 volume were mounted.

... completed OK [elapsed time = 8 Seconds 18446744073709066 msecs 858 usecs]
+ rc=0
+ date
Wed Dec 22 15:14:49 CET 2010
+ '[' '!' 0 -eq 0 ']'
+ return 0
+ exit_on_error add_secondary_ipaddress 10.78.87.23
+ add_secondary_ipaddress 10.78.87.23
+ local ip=10.78.87.23
+ shift
++ expr match 10.78.87.23 '\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*'
+ local destination=10.78.87.23
+ '[' -n '' ']'
+ ping -c 1 -q -n 10.78.87.23
PING 10.78.87.23 (10.78.87.23) 56(84) bytes of data.

--- 10.78.87.23 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

+ local other_options=
++ expr match '' '.*\(\<dev\s\+\w\+\>\).*'
+ local dev_name=
+ '[' -z '' ']'
++ ip route get 10.78.87.23
++ grep -o '\<dev [[:alnum:]]\+\>'
+ dev_name='dev bond0'
++ expr match 10.78.87.23 '.*\(\/[0-9]\+\)'
+ '[' -z '' ']'
++ ip addr show dev bond0
++ grep -m1 -o '^[[:space:]]*inet[[:space:]]\+[^[:space:]]\+'
++ grep -o '\/[0-9]\+'
+ local mask_width=/17
+ ip=10.78.87.23/17
++ expr match '' '.*\(\<brd\s\+\S\+\).*'
+ local brd_addr=
+ '[' -z '' ']'
++ expr match '' '.*\(\<broadcast\s\+\S\+\).*'
+ brd_addr=
+ '[' -z '' ']'
+ brd_addr='brd +'
+ ip -f inet addr add 10.78.87.23/17 brd + dev bond0
+ local rc=0
+ '[' 0 -eq 0 ']'
+ arping -c 1 -A -I bond0 10.78.87.23
ARPING 10.78.87.23 from 10.78.87.23 bond0
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ arping -c 1 -U -I bond0 10.78.87.23
ARPING 10.78.87.23 from 10.78.87.23 bond0
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ return 0
+ rc=0
+ date
Wed Dec 22 15:14:59 CET 2010
+ '[' '!' 0 -eq 0 ']'
+ return 0
+ exit_on_error ncpcon bind --ncpservername=ADM_SERVER --ipaddress=10.78.87.23
+ ncpcon bind --ncpservername=ADM_SERVER --ipaddress=10.78.87.23
... Executing " bind"

... completed OK [elapsed time = 453 usecs]
+ rc=0
+ date
Wed Dec 22 15:14:59 CET 2010
+ '[' '!' 0 -eq 0 ']'
+ return 0
+ exit 0









MESSAGE LOG

Dec 22 15:14:11 spcsl01 ncs-resourced: Try LDAP for ADMPOOL_SERVER
Dec 22 15:14:11 spcsl01 kernel: NSSLOG ==> [MSAP] comnLog.c[201]
Dec 22 15:14:11 spcsl01 kernel: Pool "ADMPOOL" - MSAP activate.
Dec 22 15:14:11 spcsl01 kernel: Server(6723064c-c760-102e-a1-3d-d485646b0e08) Cluster(00000000-0000-0000-00-00-000000000000)
Dec 22 15:14:11 spcsl01 kernel: NSSLOG ==> [MSAP] comnLog.c[201]
Dec 22 15:14:11 spcsl01 kernel: Pool "ADMPOOL" - Watching pool.
Dec 22 15:14:11 spcsl01 kernel: NSSLOG ==> [MSAP] comnLog.c[201]
Dec 22 15:14:11 spcsl01 kernel: Pool "ADMPOOL" - Probe request generated.
Dec 22 15:14:11 spcsl01 kernel: Server(6723064c-c760-102e-a1-3d-d485646b0e08) Cluster(00000000-0000-0000-00-00-000000000000)
Dec 22 15:14:11 spcsl01 kernel: MSAP Server(14135a1c-c781-102e-b2-17-d48564676c12) MSAP Cluster(00000000-0000-0000-00-
Dec 22 15:14:41 spcsl01 kernel: NSSLOG ==> [MSAP] comnLog.c[201]
Dec 22 15:14:41 spcsl01 kernel: Pool "ADMPOOL" - Pool not in use by another server.
Dec 22 15:14:41 spcsl01 adminus daemon: Volume state change request for ADM from NCP
Dec 22 15:14:41 spcsl01 adminus daemon: mounting volume ADM with extra options (null)
Dec 22 15:14:46 spcsl01 adminus daemon: Mount table (fstab) updating on volume mount.
0 Likes
pcoen Absent Member.
Absent Member.

Re: OES2 Linux Cluster - slow resource migration

I'm coming into this a little late, but could this be a result of the /MetadataGroupTime and /MetadataGroupWriteLimit defaults? If you've got a fair amount of changes occuring, there could be a decent amount of data. By default, the Metadata flush is timer is every 40 seconds. Until that completes, the volume won't migrate over. You could try bringing the time or size down, but it might have a negative impact on performance. There's a note in the NSS File System Administration Guide for Linux about the effect of the MetadataGroupTime parameter on volume migration in a cluster.
0 Likes
Knowledge Partner
Knowledge Partner

Re: OES2 Linux Cluster - slow resource migration

Nice find! I'll take a peek at that myself.
0 Likes
Micro Focus Expert
Micro Focus Expert

Re: OES2 Linux Cluster - slow resource migration

Has anybody decreased the MetadataGroupTime parameter and, if so, what was the impact on file system performance?
Laura Buckley

Views/comments expressed here are entirely my own.
If you find this post helpful, please show your appreciation and click on "Like" below...
0 Likes
sma2006 Outstanding Contributor.
Outstanding Contributor.

Re: OES2 Linux Cluster - slow resource migration

laurabuckley;2084820 wrote:
Has anybody decreased the MetadataGroupTime parameter and, if so, what was the impact on file system performance?


Unfortunately, we could not try this yet.
0 Likes
Knowledge Partner
Knowledge Partner

Re: OES2 Linux Cluster - slow resource migration

I've left mine the default and so far haven't had any issues.

I have several large data volumes that migrate relatively quickly. For example, I have a "home" volume of about 800 GB with about 12 million files and at least 2500 trustees. Migrates from server to server in about 15-20 seconds (it varies depending upon the time of day that I migrate it, etc.)

Keep in mind that iManager defaults to a 30 second refresh interval.
0 Likes
sma2006 Outstanding Contributor.
Outstanding Contributor.

Re: OES2 Linux Cluster - slow resource migration

kjhurni;2085438 wrote:
I've left mine the default and so far haven't had any issues.

I have several large data volumes that migrate relatively quickly. For example, I have a "home" volume of about 800 GB with about 12 million files and at least 2500 trustees. Migrates from server to server in about 15-20 seconds (it varies depending upon the time of day that I migrate it, etc.)

Keep in mind that iManager defaults to a 30 second refresh interval.


We also have 15-30 sec migration time with OES2 Linux, but we used to have less than 5 sec. with NetWare and that's why I wonder if this is normal.
Less than 5 sec. migration time allows you to migrate resources during working hours and it's really more difficult to migrate during the day when migration take 30 sec.
0 Likes
warper2 Super Contributor.
Super Contributor.

Re: OES2 Linux Cluster - slow resource migration

sma wrote:

>
> kjhurni;2085438 Wrote:
>> I've left mine the default and so far haven't had any issues.
>>
>> I have several large data volumes that migrate relatively quickly. For
>> example, I have a "home" volume of about 800 GB with about 12 million
>> files and at least 2500 trustees. Migrates from server to server in
>> about 15-20 seconds (it varies depending upon the time of day that I
>> migrate it, etc.)
>>
>> Keep in mind that iManager defaults to a 30 second refresh interval.

>
> We also have 15-30 sec migration time with OES2 Linux, but we used to
> have less than 5 sec. with NetWare and that's why I wonder if this is
> normal.
> Less than 5 sec. migration time allows you to migrate resources during
> working hours and it's really more difficult to migrate during the day
> when migration take 30 sec.
>
>


I have complained about this for I don't know how long. Maybe oes 3 or
whatever they call it will finally fix it again.

0 Likes
Knowledge Partner
Knowledge Partner

Re: OES2 Linux Cluster - slow resource migration

In my case that 15-30 second migration for THAT particular resource is 3x faster than on NetWare. With NetWare I'd have to wait over a minute every time just for it to "offline" the source and begin to load it on the secondary node.

My other "smaller" volumes/resources migrate fairly quickly (5-10 seconds).
0 Likes
sma2006 Outstanding Contributor.
Outstanding Contributor.

Re: OES2 Linux Cluster - slow resource migration

I tried changing both MetadatgroupWriteTime and MetadatagroupwriteLimit ( 20sec and 10000 ) , but this does not affect the time to migrate the resource.
Actually, these parameters may affect the time to load a volume only if you expect to have some dirty cache in the journal after a server crash and this is not the case when I do migration test.
So I still have a minimum 30-40 sec to migrate and OES2 Linux resource comparing to 5-10 sec with NetWare with the same resource and hardware (in a mixed node cluster...)

Hope this help.
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.