we are experiencing following issue. In our OML9 we have virtual node defined by cluster group and 2 physical servers . According to our experinece we expect that the deployed policies will be automatically disabled on the inactive (OFFLINE) node. Although this works on other cluster environments we have one case where the setup does not work properly. The cluster software used in all cases is VCS (Veritas Cluster Server). The OS is RHEL Linux.
Could you please provide more details as to what exactly is not working properly?
If you find that this or any post resolves your issue, please be sure to mark it as an accepted solution.
If you liked it I would appreciate KUDOs.
In order for the Cluster Awareness (Claw) to disable and enable the policies, the policies need to be assigned to the virtual node.
Make sure the policies are only assigned to the virtual node. If a policy is assigned to the physical node and the virtual node, it will stay enabled.
You can check on the agent if a policy was assigned to a HARG using ovpolicy:
# ovpolicy -list -level 4
monitor "distrib_mon" enabled 0009.0000
policy id : "6ac5c3bc-e455-11dc-808e-00306ef38b73"
owner : "OVO:tobias"
category (1): "examples"
attribute (1) : "product_id" "ovoagt"
attribute (2) : "checksum_header" "73216b61c54e950fc852c7d8292de7c408b8cadf"
attribute (3) : "version_info" ""
attribute (4) : "version_id" "6ab89412-e455-11dc-9d57-00306ef38b73"
attribute (5) : "HARG:ov-server" "no_value"
Policies that were assigned to a virtual node have an attribute with the name "HARG:<HARG-name>".
In my case the HARG name is ov-server.
Next, please check that the cluster was detected on the managed node:
# ovclusterinfo -a
If it returns an error, the cluster version may not be supported or recognized by the used agent version.
What agent version are you using?
# opcagt -version
I did the checks as suggested and here is the output.
ovpolicy -list -level 4 (showing only 1 policy in the output below)
* List installed policies for host 'localhost'.
configfile "GBL_Linux_OA_ParmPolicy" enabled 1100.0001
policy id : "1ea23bbc-5a40-71e2-0559-17fc19cb0000"
owner : "OVO:DHL_PRG_OML"
category (1): "HPOpsAgt"
attribute (1) : "product_id" ""
attribute (2) : "checksum_header" "12896ac3017c7a55daed46300f1abc175ce278ed"
attribute (3) : "creation_date" "1316647563"
attribute (4) : "creation_user" "MSSPINT12\Administrator (MSSPINT12)"
attribute (5) : "version_info" ""
attribute (6) : "version_id" "1ea63c6c-5a40-71e2-0559-17fc19cb0000"
configsettings "OVO settings" enabled 1
policy id : "a1b6413e-f15e-11d6-83d0-001083fdff5e"
owner : "OVO:DHL_PRG_OML"
category : <no categories defined>
attribute : <no attributes defined>
ERROR: (conf-599) Cluster exception.
(conf-236) Can not get the state of the local node.
I did not find HARG in the output of the first command and also the command ovclusterinfo -a resulted in an error.
Yes, HARG is not defined. That means that the policy was either assigned only to the physical node(s) or it was assigned to both, the physical and virtual nodes. To check what a policy is assigned to, you can go to the policy bank or All policies in the AdminUI and in the View menu select "Direct node(group) assignments".
The agent version on the managed niode is pretty old. Possibly that agent version doesn't support the cluster version. To check what HA version is supported, you can check SUMA (support matrix):
Select the HP Operations Agent as product and then go to the High-Availabilty section.
the policies were directly assigned only to the virtual node. No policies were assigned directly to the physical ones.
Interesting thing is that we are using the same setup on the same version of agent elsewhere and the cluster-aware setup works perfectly (although the command ovclusterinfo -a fails there with the same error).
The only difference I could spot was that the problematic cluster runs on RHEL 6.1 while the one where all works fine is RHEL 6.3. But I cant believe that this could be the root cause...
We have recently experienced an issue also on the "working cluster". When one of the nodes failed and the cluster was failed over, the policies were automatically enabled on the active node, but the policies on the inactive nod were not DISABLED -> resulting into fake alerts. Could this somehow be related to the fact that command ovclusterinfo -a does not work?
I think there are two independent issues:
1. Cluster awareness (Claw) doesn't work
This is indicated by "ovclusterinfo -a" failing. If ovclusterinfo -a doesn't work, then enabling policies and disabling policies probably won't work, either. I'm surprise to hear that it does.
And the reason why it's failing is probably because the cluster is not supported by your agent version.
2. No HARG in ovpolicy output
Regardless weather Claw works or not, the HARG attribute should be present.
Perhaps you did not specify a HARG (cluster package) for the virtual node in the node bank.
ad 1. Is there something I can do (change config) to make the command work?
Regarding your point that cluster is not supported by our agent. Is there an agent that supports this configuration?
[root@czstlls069 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.1 (Santiago) [root@czstlls069 ~]# /opt/VRTSvcs/bin/had -version Engine Version 6.0 Join Version 184.108.40.206 Build Date Fri 11 Jan 2013 01:00:01 AM CET PSTAMP 6.0.300.000-GA-2013-01-10-16.00.01
According to SUMA I have just downloaded there is currently no agent supporting this combination of OS and VCS. Can you confirm?
ad 2. We have lot of cluster environments where the monitoring works fine. The problem is just here - we have double checked the OML setup and all seems fine.
ad 1. Yes, that's correct. There is currently no agent version that supports that combination.
I found an existing Enhancement Request for VCS 6.0.1 support:
QCCR1A169008 Operations Agent support needed on Veritas Cluster 6.0.1 on RHEL 6.4
You can register for that ER to be notified when it is released.
ad 2. Can you please run this command to show if cluster package is defined correctly?
# /opt/OV/bin/OpC/utils/opcnode -list_virtual node_name=<virtual-node-name>
# /opt/OV/bin/OpC/utils/opcnode -list_virtual node_name=prgnbu2013 Attributes of virtual node 'prgnbu2013.gcc.dhl.com' ========== cluster_package=nbumas2013 node_list="prgprod83.dhl.com prgproddr69.dhl.com" Operation successfully completed. # /opt/OV/bin/OpC/utils/opcnode -list_virtual node_name=prgnbunfe Attributes of virtual node 'prgnbunfe.dhl.com' ========== cluster_package=nfenbumas node_list="prgprod101.dhl.com prgproddr101.dhl.com" Operation successfully completed.
That looks all good. I would try these things:
1. Distribute with -force update to the virtual node
Verify with ovpolicy -list -level 4
2. De-assign / re-assign the policy to the virtual node and distribute again
Verify with ovpolicy -list -level 4
3. If the previous steps fail, you could try this to re-create the resolved assignments:
That will re-create the resolved node to policy assignments (opc_node_config).
That means the next distribution will re-distribute all the policies when distributing next time even without -force.
We have tried step 1 and 2 with no luck. The situation stays the same.
Step 3 is something I dont dare to do on a server that monitors our whole production environment.
I will raise enhancement request for this platform combination.