

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
on software inventory, then it will look like all of their remediation jobs are
hanging. You have to open each job individually and search for the hung managed server.
In this case, some Windows managed servers are hanging due to
interactions with the Windows Update Agent (WUA) as part of the software
inventory gathering.
Accepted Solutions


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hello,
Here are the answers to your questions:
1. I had assumed that the value was in seconds. Your response sounds like it is in minutes. Is that correct?
Answer: The value is in seconds.
2. Does this timeout apply to ALL individual way commands?
Answer: The short answer is "Yes". The long answer - the way.command_shutdown_grace_period parameter is used when the WAY waits for the SA agent to provide the command response. If the agent doesn't respond in time (<timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>) then it leads to a timeout and the agent command status is set to failure.
3. So for a given server in a job if there are 10 commands/actions against this server, would the total timeout potentially be <timeout set on job> + <way.command_shutdown_grace_period> * 10 + <150 seconds>?
Answer: The <way.command_shutdown_grace_period> is applied on each command individually. If the first command times out then the whole job will fail after <timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>. If the first command succeeds then the same <way.command_shutdown_grace_period> will be applied to the next command and so on.
If you find that this or any post resolves your issue, please be sure to mark it as an accepted solution.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Suzanne,
Hope you are doing fine today!
I took a look over the mentioned QCR and some other related cases for this matter and did a summary from some of the internal discussions about this matter:
All of our job types have internal timeouts around the execution of actions (internally we call them commands) against managed servers, the change that occurred around 10.23 was that the "Command Engine" (waybot) will not give up on one of these managed server actions (commands) as long as the agent reports that it is still working on the action.
This effectively prevents the parent job from every timing out, if some sort of hang occurred on the agent side, the purpose of this new parameter is to allow a configurable middle ground between the old behavior and the new behavior.
A value of 0 means to effectively use the old behavior.
Now the old behavior was for the job to mark the action (command) against the managed server as failed after the timeout was reached... regardless of whether or not the action was still executing on the agent side.
this behavior is arguably undesirable as the job report in HPSA is *implying* that all work related to the job is complete. yet, it is possible that there could still be actions executing on managed servers that the job was executed against.
a simple example would be if you ran a server script with say "while true; do sleep 1; done". that script will never end, pre 10.23, the job would register a completed status after the specified timeout; post 10.23, the job will continue to remain active as long as the agent reports that it is still executing the script.
(actually there might be an agent side timeout for server scripts whereby the agent's batchbot will kill the child process after the timeout period. but the idea is still a good analogy.)
So a value of 0 says to effectively use the old behavior, HPSA will provide no "grace" for running beyond the specified timeout for the action against the managed server, a value of -1 means to effectively use the new post 10.23 behavior; positive values are the middle ground.
This is totally up to the customer, if they want to use a positive value, or 0, or -1.
We have no way of providing a "recommended" positive value. but we can explain what the different values mean, and the customer can decide, A value of 3600 means that the waybot will wait an additional 3600 minutes past the specified timeout for the managed server action.
Most of our timeouts around actions that involve implicit software/patch listings are usually around 1800s. so if the WUA interaction ends up taking 45 minutes, then with the old behavior the command would have been cancelled. WIth the new post 10.23 behavior, the job would have been held as ACTIVE until the command finally completed.
if the grace_timeout was set to say 1800, then a command that ended up taking 45 minutes would end up registering the completion of the over-run action (command).
Hope this helps,
Regards,
Customer Support Engineer
If you find that this or any other post resolves your issue, please be sure to mark it as an accepted solution.
If you are satisfied with anyone’s response please remember to give them a KUDOS by clicking on the STAR at the bottom left of the post and show your appreciation.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
I had assumed that the value was in seconds. Your response sounds like it is in minutes. Is that correct?
Does this timeout apply to ALL individual way commands? For instance, would it apply to
- agent upgrades (analyze, download, perform upgrade)
- individual actions/commands in a patch or software remediate job (analyze, chunker, doer, download, install, scan for compliance, reboot),
- communications test,
- run server script,
- individual actions in run OS build plan
- etc
So for a given server in a job if there are 10 commands/actions against this server, would the total timeout potentially be
<timeout set on job> + <way.command_shutdown_grace_period> * 10 + <150 seconds>
where the way.command_shutdown_grace period could be applied 10 times because there are 10 commands
OR is it truly
<timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>
as documented in the QC? Where the way.command_shutdown_grace_period is only applied once per server?


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hello,
Here are the answers to your questions:
1. I had assumed that the value was in seconds. Your response sounds like it is in minutes. Is that correct?
Answer: The value is in seconds.
2. Does this timeout apply to ALL individual way commands?
Answer: The short answer is "Yes". The long answer - the way.command_shutdown_grace_period parameter is used when the WAY waits for the SA agent to provide the command response. If the agent doesn't respond in time (<timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>) then it leads to a timeout and the agent command status is set to failure.
3. So for a given server in a job if there are 10 commands/actions against this server, would the total timeout potentially be <timeout set on job> + <way.command_shutdown_grace_period> * 10 + <150 seconds>?
Answer: The <way.command_shutdown_grace_period> is applied on each command individually. If the first command times out then the whole job will fail after <timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>. If the first command succeeds then the same <way.command_shutdown_grace_period> will be applied to the next command and so on.
If you find that this or any post resolves your issue, please be sure to mark it as an accepted solution.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Suzanne,
Wondering if you have any update on this matter?
Please let us know if there is anything else we can do for you or please mark this as Aswered if the response was sucessfull.
Regards,
Customer Support Engineer
If you find that this or any other post resolves your issue, please be sure to mark it as an accepted solution.
If you are satisfied with anyone’s response please remember to give them a KUDOS by clicking on the STAR at the bottom left of the post and show your appreciation.