Vice Admiral Vice Admiral
Vice Admiral
785 views

way.command_shutdown_grace_period recommendations?

Jump to solution
Per QCCR1D240570 a new parameter "way.command_shutdown_grace_period" has been added to avoid the following scenario:
 
If you  are running  numerous batches of remediations, and each batch has at least a single server that hangs
on software inventory, then it will look like all of their remediation jobs are
hanging.  You have to open  each job individually and search for the hung managed server.

In this case,  some Windows  managed   servers  are  hanging   due  to
interactions  with the  Windows  Update Agent  (WUA)  as part  of the  software
inventory gathering.
 
Are there any recommendations on what value way.command_shutdown_grace_period recommendations should be set to?
Are there any pros/cons to setting it to anything but the default (-1)?
 
0 Likes
1 Solution

Accepted Solutions
Vice Admiral Vice Admiral
Vice Admiral

Hello,

Here are the answers to your questions:

1. I had assumed that the value was in seconds. Your response sounds like it is in minutes. Is that correct?

Answer: The value is in seconds.

2. Does this timeout apply to ALL individual way commands?

Answer: The short answer is "Yes". The long answer - the way.command_shutdown_grace_period parameter is used when the WAY waits for the SA agent to provide the command response. If the agent doesn't respond in time (<timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>) then it leads to a timeout and the agent command status is set to failure.

3. So for a given server in a job if there are 10 commands/actions against this server, would the total timeout potentially be <timeout set on job> + <way.command_shutdown_grace_period> * 10 + <150 seconds>?

Answer: The <way.command_shutdown_grace_period> is applied on each command individually. If the first command times out then the whole job will fail after <timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>. If the first command succeeds then the same <way.command_shutdown_grace_period> will be applied to the next command and so on.

 

Micro Focus Customer Support

If you find that this or any post resolves your issue, please be sure to mark it as an accepted solution.

View solution in original post

0 Likes
4 Replies
Captain Captain
Captain

Hi Suzanne,

Hope you are doing fine today!

I took a look over the mentioned QCR and some other related cases for this matter and did a summary from some of the internal discussions about this matter:

All of our job types have internal timeouts around the execution of actions (internally we call them commands) against managed servers, the change that occurred around 10.23 was that the "Command Engine" (waybot) will not give up on one of these managed server actions (commands) as long as the agent reports that it is still working on the action.
This effectively prevents the parent job from every timing out, if some sort of hang occurred on the agent side, the purpose of this new parameter is to allow a configurable middle ground between the old behavior and the new behavior.

A value of 0 means to effectively use the old behavior.

Now the old behavior was for the job to mark the action (command) against the managed server as failed after the timeout was reached... regardless of whether or not the action was still executing on the agent side.

this behavior is arguably undesirable as the job report in HPSA is *implying* that all work related to the job is complete. yet, it is possible that there could still be actions executing on managed servers that the job was executed against.
a simple example would be if you ran a server script with say "while true; do sleep 1; done". that script will never end, pre 10.23, the job would register a completed status after the specified timeout; post 10.23, the job will continue to remain active as long as the agent reports that it is still executing the script.
(actually there might be an agent side timeout for server scripts whereby the agent's batchbot will kill the child process after the timeout period. but the idea is still a good analogy.)

So a value of 0 says to effectively use the old behavior, HPSA will provide no "grace" for running beyond the specified timeout for the action against the managed server, a value of -1 means to effectively use the new post 10.23 behavior; positive values are the middle ground.
This is totally up to the customer, if they want to use a positive value, or 0, or -1.

We have no way of providing a "recommended" positive value. but we can explain what the different values mean, and the customer can decide, A value of 3600 means that the waybot will wait an additional 3600 minutes past the specified timeout for the managed server action.

Most of our timeouts around actions that involve implicit software/patch listings are usually around 1800s. so if the WUA interaction ends up taking 45 minutes, then with the old behavior the command would have been cancelled. WIth the new post 10.23 behavior, the job would have been held as ACTIVE until the command finally completed.
if the grace_timeout was set to say 1800, then a command that ended up taking 45 minutes would end up registering the completion of the over-run action (command).

Hope this helps,

Regards,

Manfred Monge
Customer Support Engineer

If you find that this or any other post resolves your issue, please be sure to mark it as an accepted solution.
If you are satisfied with anyone’s response please remember to give them a KUDOS by clicking on the STAR at the bottom left of the post and show your appreciation.
0 Likes
Vice Admiral Vice Admiral
Vice Admiral

I had assumed that the value was in seconds.  Your response sounds like it is in minutes.   Is that correct?

Does this timeout apply to ALL individual way commands?   For instance, would it apply to

  • agent upgrades (analyze, download, perform upgrade)
  • individual actions/commands in a patch or software remediate job (analyze, chunker, doer, download, install, scan for compliance, reboot), 
  • communications test,
  • run server script,
  • individual actions in run OS build plan
  • etc

 

So for a given server in a job if there are 10 commands/actions against this server, would the total timeout potentially be

<timeout set on job> + <way.command_shutdown_grace_period> * 10 + <150 seconds>

where the way.command_shutdown_grace period could be applied 10 times because there are 10 commands

OR is it truly

<timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>

as documented in the QC?  Where the way.command_shutdown_grace_period is only applied once per server?

 

0 Likes
Vice Admiral Vice Admiral
Vice Admiral

Hello,

Here are the answers to your questions:

1. I had assumed that the value was in seconds. Your response sounds like it is in minutes. Is that correct?

Answer: The value is in seconds.

2. Does this timeout apply to ALL individual way commands?

Answer: The short answer is "Yes". The long answer - the way.command_shutdown_grace_period parameter is used when the WAY waits for the SA agent to provide the command response. If the agent doesn't respond in time (<timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>) then it leads to a timeout and the agent command status is set to failure.

3. So for a given server in a job if there are 10 commands/actions against this server, would the total timeout potentially be <timeout set on job> + <way.command_shutdown_grace_period> * 10 + <150 seconds>?

Answer: The <way.command_shutdown_grace_period> is applied on each command individually. If the first command times out then the whole job will fail after <timeout set on job> + <way.command_shutdown_grace_period> + <150 seconds>. If the first command succeeds then the same <way.command_shutdown_grace_period> will be applied to the next command and so on.

 

Micro Focus Customer Support

If you find that this or any post resolves your issue, please be sure to mark it as an accepted solution.

View solution in original post

0 Likes
Captain Captain
Captain

Hi Suzanne,

 

Wondering if you have any update on this matter? 

Please let us know if there is anything else we can do for you or please mark this as Aswered if the response was sucessfull.

Regards,

Manfred Monge
Customer Support Engineer

If you find that this or any other post resolves your issue, please be sure to mark it as an accepted solution.
If you are satisfied with anyone’s response please remember to give them a KUDOS by clicking on the STAR at the bottom left of the post and show your appreciation.
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.