Our vBulletin migration is complete.
Welcome vBulletin users! All content and user information from the Micro Focus Forums (vBulletin) site has been migrated to this site. READ MORE.

parameter to restrict the numbers of servers that can be run as part of a HPSA job.

parameter to restrict the numbers of servers that can be run as part of a HPSA job.

We need a configurable parameter in HPSA that can allow Admins to control the numbers of servers that can be grouped in Job run by a HPSA user. So that users are not killing a mesh by running a single job on 1000s of servers at a time.

 

8 Comments
Visitor.. SivaKV
Visitor..

Configurable parameter is useful Here are some suggestions.

-- Need to have the ability to bypass that option for Admin users.

-- That parameter sgould be configurable based on Job type. (Ex: upto 1000 servers per server script, 250 servers per Audit job etc.)

Trusted Contributor.. C_M Trusted Contributor..
Trusted Contributor..

For SW remediation jobs, the concurrency is actually mediated by some Command Engine (way) parameters that limit the number of devices handled in one chunk. I got the below information from a previous Support call:

"Concurrency is controlled by a few configuration parameters (with their default values):

• way.max_remediations: 50

• way.max_remediations.action: 100

• way.remediate.chunk_size: 20

• way.remediate.max_concurrency: 10

At the top, we have the chunker process which splits the job into smaller chunks. It is governed by chunk_size and max_concurrency. Chunk size states the highest number of servers in each chunk, while max_concurrency determines how many chunks will be running on a slice. A chunk can have less servers than stated in the chunk size. Each chunk is ran by a doer process. This will execute actions on the managed servers themselves. Each doer executes a number of session commands. Commands are of two types: action and non-action. Non-action commands are: download, reconcile.version, compliance action commands are the installation of the packages themselves (anything that actually changes the server, it may be a script as well). Max_remediations controls the maximum amount of non-action phase commands executed on a core, while max_remediation_action controls the maximum amount of action commands on a core."

I'm not sure whether the same rules apply for other types of jobs (audit, server script etc), but I'd expect someone can enlighten me. Nevertheless, I do think it would be a good parameter to add (even though the tool already manages the job under the covers).

Micro Focus Expert
Micro Focus Expert
Status changed to: Waiting for Votes
 
Micro Focus Expert
Micro Focus Expert

Pushing this work back onto the user by creating a restriction of how many servers to put in a job feels like a regression in functionality. 
The software is suppose to simply the task of automation not create more work in doing so.
As a user I really don't care what the system can or cannot handlle.   My goal is to run something over a batch of servers.   
Why do I need to care about its limits !?   I should not have to.

It would be better that SA can be given a job of 1000 or more servers and it limits itself so that it does not cause resource starvation that is detremental to itself or other jobs that may be executing.   From my knowledge the system does just that.  If your system has been tuned to allow more load you'll need to identified the job type and its corresponding tuning parameter.  If there is particular job type that excerts too much load perhaps you could share that information.

Perhaps what you require is visibiliy into the amount of load the system is currently under.   THAT would get my vote.

Super Contributor.. a162646 Super Contributor..
Super Contributor..

I agree that having restriction will complicate things for users. However due to the convuluted nature of HPSA there is currently no qucik and easy way to identify bottlenecks when someone executes a job (could be an audit or software remediation or script) with 1000s of servers. Unless there is an easy way to identify and fix in few hours rather than taking days to fix issues caused by jobs with huge of lot of servers we feel having a restriction and circulating among user base would be more apt approach. This is not something new there are always system limits that can be placed to ensure overall system stability. We are asking this parameter to ensure that system stability. As admins while we want the HPSA to be used a platform for automation it is also upon us to make sure the Application gives equal level of performance for all users irrespective of the type of jobs they run. 

-

Prudhviraj

Outstanding Contributor.. csaunderson Outstanding Contributor..
Outstanding Contributor..

I have a practical implementation question:  let's say this gets implemented, and it's set at 100 servers. I have a device group of 200 servers. How does the end user manage that 100 of these servers have had a job run against them, and 100 haven't? I guess a new flag could be put into the view for a server in a DG - remediated or some such - but then how do you handle job failures etc?

With my user hat on, I don't care about the system limits, I need the job to run.

With my SA administrator hat on, I don't like that my users run audits on 8000 servers at once, but I also understand the business and user needs that are driving them to submit that job, and I'm going to figure out how to make sure that SA can handle it. (yes, I have those users too)..

I would rather spend effort on scaling the system to be able to support large job runs instead of limiting user actions.

 

An altemative, and admittedly more limited approach to this, would be to make configurable a warning limit - "You are submitting a job that has over <x> number of devices in it, are you sure you want to do this?" I don't like this type of approach, but it would at least weed out the "oops" factor.

Super Contributor.. a162646 Super Contributor..
Super Contributor..

You will have to figure out the implementation aspects but as a customer my need is to have a tool that can provide a better experience to our users. We have been using the SA for a while and it has always jobs like audits with a 1000 servers by few users sometimes single user that was impacting a larger user base. This effects experience with the tool. I agree that capping server count is not a permanant solution but scaling the system isnt either. You scale the system to X and then someone hits it again with more larger server count that previous making us to huddle again on scaling up the system to X+1 to handle the new set of load. 

How do you deal with these kind of frequent situations on a permanent basis?

Outstanding Contributor.. csaunderson Outstanding Contributor..
Outstanding Contributor..

(I'm a customer of MicroFocus).

I don't have a good answer: Like I said, I have users that run large scale audits, and I cringe every time they do. The way I've been solving it for heavyweight audits that I myself (as a user) run is to split the groups smaller and smaller. Longer term, I'll end up wrapping that in an OO flow that allows me to initiate, manage and report on the n number of audits that get spawned from a single job, because that's a better way to implement that per-job limit than trying to get SA to do it for me.

 

The fact that I can support 9000 servers in a single audit (that runs a single script, but 9000 times) is amazing.

 

As to "having to figure out the implementation aspects": I don't feel that it's right that we hand that over to MicroFocus to decide, because in the end, it's going to annoy everyone.  So put differently: would you want to set different limits based on the type of audit? Is an audit of 10000 servers that's a single script to be treated differently than a PCI DSS 3.2 audit that has 197 checks in it? Would a configuration option on the audit policy level to set a limit on the # of servers that this audit policy can be executed on in a single job that then is applied to a DG that it is run against make sense? A global context doesn't offer a lot of flexibility, I'd rather be able to benchmark an audit and then say "I'm comfortable with a single job against 1000 servers for this audit, but 500 for this one and 10 for this other one", and then have jobs cascade based on that. Middle ground would be to set a global limit but then have multiple jobs spawn to execute against a DG that is > than the limit. Or are you set on "no more than <x> servers in a job" as a hard limit?

The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.