

Absent Member..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-28
07:47
298 views
How to configure an Automatic Action to restart the HP ITO Agent on a node
Hello,
We are running OVOW 7.5 on a Windows 2003 SP1 server.
we have another server to which various instrumentation files are deployed so that it can perform the task of polling nodes and performing the actual checks defined in the instrumentation files.
Owing to a misconfiguration in one of the scripts the OVOW Action Agent or the Message Agent appears to buffer events in the \Opc\tmp folder causing an aao*** or mao*** file to grow to the point of consuming the entire disk and impacting the server's ability to carry-out the rest of its polling/check tasks.
We have found that manually restarting the HP ITO Agent service via the Services applet automatically deletes the offending aao*** or mao*** file from the \Opc\tmp directory.
Because this disk utilization issue is occuring often I have an SNMP based threshold policy to check on the utilization of the C: drive every 15 mins and I would like to add an Automatic Action to restart the HP ITO Agent service.
I have tried the following in the Automatic Action field:
net stop "HP ITO Agent" && net start "HP ITO Agent"
and I have specified the FQDN of the target server on which the HP ITO Agent service should be stopped in the node field immediately below the Action field but the action fails.
Does anyone have any tips as to the correct command syntax for restarting the HP ITO Agent on a node via an Automatic Action is.
Alternatively, if anyone has implemented the same task using a different method (i.e. a .bat file) I would also be interested in that option as well.
While debugging the instrumentation file is the long-term solution to this behaviour it is something that will require a bit of time to sort through.
My hope is to implement a way of gracefully handling the disk utilization symptom so that the impact to monitoring on that server is minimal.
Thanking you all in advance.
We are running OVOW 7.5 on a Windows 2003 SP1 server.
we have another server to which various instrumentation files are deployed so that it can perform the task of polling nodes and performing the actual checks defined in the instrumentation files.
Owing to a misconfiguration in one of the scripts the OVOW Action Agent or the Message Agent appears to buffer events in the \Opc\tmp folder causing an aao*** or mao*** file to grow to the point of consuming the entire disk and impacting the server's ability to carry-out the rest of its polling/check tasks.
We have found that manually restarting the HP ITO Agent service via the Services applet automatically deletes the offending aao*** or mao*** file from the \Opc\tmp directory.
Because this disk utilization issue is occuring often I have an SNMP based threshold policy to check on the utilization of the C: drive every 15 mins and I would like to add an Automatic Action to restart the HP ITO Agent service.
I have tried the following in the Automatic Action field:
net stop "HP ITO Agent" && net start "HP ITO Agent"
and I have specified the FQDN of the target server on which the HP ITO Agent service should be stopped in the node field immediately below the Action field but the action fails.
Does anyone have any tips as to the correct command syntax for restarting the HP ITO Agent on a node via an Automatic Action is.
Alternatively, if anyone has implemented the same task using a different method (i.e. a .bat file) I would also be interested in that option as well.
While debugging the instrumentation file is the long-term solution to this behaviour it is something that will require a bit of time to sort through.
My hope is to implement a way of gracefully handling the disk utilization symptom so that the impact to monitoring on that server is minimal.
Thanking you all in advance.
9 Replies


Absent Member..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-28
09:25
Hi Michael,
When you stop the agent you are in affect stopping the program running the automatic action, so it won't be around to start itself up.
Himanshu.
When you stop the agent you are in affect stopping the program running the automatic action, so it won't be around to start itself up.
Himanshu.


Absent Member..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-28
14:02
Hello Himanshu.
Thank you for your reply.
That makes sense.
Essentially your saying that because the Automatic Action task would be run by the HP ITO Agent on the node itself it therefore cannot execute a command that effectively stops itself and then be expected to start-up again magically out of nowhere.
This means I need to look at a different recovery option altogether.
The first option I can think of is using the node's own System Event logs as the trigger for a local batch file that will stop and restart the HP ITO Service when a Disk Full event is logged there.
Alternatively, I may be able to use one of the Perl scripts in Dave Roth's "WIN32 Perl Scripting" book to stop and restart the service and simply schedule the script to run once or twice a day so that regardless of the state of the disk utilization the service gets restarted and with it the disk cleaned-up. This would mean I don't have to worry about trying to tie into the System Event log as a trigger for the script as it would simply be a scheduled task on the OS.
Thanks again Himanshu, your reply has helped me see the path I need to take.
5 points for your trouble.
Thank you for your reply.
That makes sense.
Essentially your saying that because the Automatic Action task would be run by the HP ITO Agent on the node itself it therefore cannot execute a command that effectively stops itself and then be expected to start-up again magically out of nowhere.
This means I need to look at a different recovery option altogether.
The first option I can think of is using the node's own System Event logs as the trigger for a local batch file that will stop and restart the HP ITO Service when a Disk Full event is logged there.
Alternatively, I may be able to use one of the Perl scripts in Dave Roth's "WIN32 Perl Scripting" book to stop and restart the service and simply schedule the script to run once or twice a day so that regardless of the state of the disk utilization the service gets restarted and with it the disk cleaned-up. This would mean I don't have to worry about trying to tie into the System Event log as a trigger for the script as it would simply be a scheduled task on the OS.
Thanks again Himanshu, your reply has helped me see the path I need to take.
5 points for your trouble.
Katsioulis Alex

Absent Member.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-28
14:16
Hi there,
As far as I can remember, OVOW 7.5 has 'opcragt'.
Are the files deleted when you stop the agent processes but not the communication process.
In other words, I would try to execute from the OVO server a
> opcragt -stop
> opcragt -start
The stop/start is almost guaranteed to be problem-free.
Cheers,
alexk
As far as I can remember, OVOW 7.5 has 'opcragt'.
Are the files deleted when you stop the agent processes but not the communication process.
In other words, I would try to execute from the OVO server a
> opcragt -stop
> opcragt -start
The stop/start is almost guaranteed to be problem-free.
Cheers,
alexk


Absent Member..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-28
14:51
Hello Michael,
Although I am far of being an OVOW expert, I might have information that could interest you. I have heard that a way of doing this was using a tool called psexec. It permits to run a file - a batch file in this case - that will be in an independant process, and thus it will not be terminated when the agent goes down. For this reason, it is able to restart the service on its own...
I am sorry I cannot help you more, but someone might have some additionnal information related to setting psexec up for such a purpose 🙂
Although I am far of being an OVOW expert, I might have information that could interest you. I have heard that a way of doing this was using a tool called psexec. It permits to run a file - a batch file in this case - that will be in an independant process, and thus it will not be terminated when the agent goes down. For this reason, it is able to restart the service on its own...
I am sorry I cannot help you more, but someone might have some additionnal information related to setting psexec up for such a purpose 🙂
JonH_5

Absent Member.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-28
18:51
For Windows nodes you can use a VB script to stop then start the OpenView services. This script can even be called from a Auotmatic or Operator Command, or called from a Scheduled Task Policy. I use this script in a Shceduled Task Policy to restart the OM Agent on all my Windows nodes every Sunday, just as a way to keep the process's clean as I have seen the agent slowly creep-up the memory usage of the nodes, restarting frees the large consumed memory.
The script I use is attached, it is written for the OM Version 8 agent which uses a Service named "HP OpenView Ctrl Service" and "HP Software Shared Trace Service", but Im sure you can modify it to accomadate the DCE agent from OV 7.5.
The script as comments in it to help sort out what then next function would perform.
JON
The script I use is attached, it is written for the OM Version 8 agent which uses a Service named "HP OpenView Ctrl Service" and "HP Software Shared Trace Service", but Im sure you can modify it to accomadate the DCE agent from OV 7.5.
The script as comments in it to help sort out what then next function would perform.
JON


Absent Member..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-29
07:18
Hi,
Make a abc.bat file on the node and add the line
net stop "HP ITO Agent"
net start "HP ITO Agent"
opcmsg a=a o=o msg_text="test"
in the automatic action, run this batch file.
- gaurav -
Make a abc.bat file on the node and add the line
net stop "HP ITO Agent"
net start "HP ITO Agent"
opcmsg a=a o=o msg_text="test"
in the automatic action, run this batch file.
- gaurav -
Thanks
Gaurav
Gaurav


Absent Member..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-29
07:59
Hello and thank you all so much for your terrific responses.
After discussing the issue with one of my colleagues we were thinking of using the Windows System Log to trigger a .bat file to run when the disk full error is logged.
We have something similar to my current requirement set-up for a different application's service which requires restarting whenever the OS detects its service has stopped but I have not had a chance to pin-down the person who created this recovery task.
That said, I am keen to try the solution suggested by GAURAV as it seems the simplest to implement at this point but I do appreciate all the other suggestion.
I will try this tomorrow and let you know how I go.
After discussing the issue with one of my colleagues we were thinking of using the Windows System Log to trigger a .bat file to run when the disk full error is logged.
We have something similar to my current requirement set-up for a different application's service which requires restarting whenever the OS detects its service has stopped but I have not had a chance to pin-down the person who created this recovery task.
That said, I am keen to try the solution suggested by GAURAV as it seems the simplest to implement at this point but I do appreciate all the other suggestion.
I will try this tomorrow and let you know how I go.
JonH_5

Absent Member.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-29
11:58
I cannot remember with the DCE agent on 7.5 how many processes are associated with the agent, but I know with version 8.0 of the HTTPS agent there are around a dozen. My issue with "net stop 'service'" is that there are situations where the service will not successfully stop. Such as is the case sometimes with the HTTPS agent. If this happen, and you start the service again, there may be redundant processes for the failed stopage that did not die off, thusly you will have multiple processes still running. You should, at first, watch to be sure the "net stop" is truly killing the associated process cleanly, if so, then the simple route is the way to go.
HTH
JON
HTH
JON


Absent Member..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2009-04-29
16:59
Michael,
We used to have a custom agent monitoring policy that was able to restart the agent in case of problem. As suggested by others, we had a small bat to kill/start the agent (opcagt -kill, opcagt -start on OVO7, ovc -kill, ovc -start on OVO8) that we scheduled via a "at" command with a slight delay. That way, we didn't kill the process running the action.
Cheers,
Emmanuel.
We used to have a custom agent monitoring policy that was able to restart the agent in case of problem. As suggested by others, we had a small bat to kill/start the agent (opcagt -kill, opcagt -start on OVO7, ovc -kill, ovc -start on OVO8) that we scheduled via a "at" command with a slight delay. That way, we didn't kill the process running the action.
Cheers,
Emmanuel.