Check ZLM Client Requests from Tomcat Localhost Access Log

0 Likes

Sometimes it happens that the zlm server has a significant higher cpu load then it should have. Several things can cause this and here is one thing you can check when you have that problem.



Sometimes one or two out of about 300 zlm clients get a problem when they refresh their assigned bundles. The client seems to fall into a loop and sending the zlm server hundreds of action requests. The zlm server has to answer all of them which require some database and edirectory queries which can cause the high cpu load. Just restarting the zlm client on such a device will stop those requests and the zlm client will continue its normal function after that. The zlm client will log hundreds of those requests with the following lines into its own zmd-messages.log file:



... 
10 Dec 2007 17:12:26 DEBUG ActionManager Invoking action 'install'
10 Dec 2007 17:12:26 DEBUG ScheduledActionsModule No bundles to transact
10 Dec 2007 17:12:27 DEBUG ActionManager Invoking action 'install'
10 Dec 2007 17:12:27 DEBUG ScheduledActionsModule No bundles to transact
...


Having such entries in your client log files is okay because you might have bundles assigned but not a very high number of them.



This simple bash script grabs all client ip addresses from the current localhost_access.log from the zlm tomcat instance and counts them per managed device. It will show you a list of requests per device and you can decide where you would like to restart the zlm client.



Here is a sample output:



# /opt/company/bin/check_zlm_client_requests.sh
Client: 10.10.10.1 / LX-SRV001.DOMAIN.COM. Requests: 9
Client: 10.10.11.1 / LX-SRV002.DOMAIN.COM. Requests: 6266
Client: 10.10.12.1 / LX-SRV003.DOMAIN.COM. Requests: 8
Client: 10.10.13.1 / LX-SRV004.DOMAIN.COM. Requests: 8
Client: 10.10.14.1 / LX-SRV005.DOMAIN.COM. Requests: 8
Client: 10.10.15.1 / LX-SRV006.DOMAIN.COM. Requests: 8
Client: 10.10.16.1 / LX-SRV007.DOMAIN.COM. Requests: 6280


In this case I restarted the zlm client with 'rug restart –force' at the devices LX-SRV002 and LX-SRV007 and the cpu load on the zlm server came down from about 60% utilization to about 5% ! Not so bad ...



I use this script frequently to check that number of requests. This script does not solve the initial problem, it just helps you to find the cause of such high utilization problems at your zlm server. If it comes back it would be a good idea to ask Novell support for help.



If you need further informations, read the header of that bash script or ask at the Novell public forums at forums.novell.com.



Rainer

Labels:

Collateral
How To-Best Practice
Comment List
Related
Recommended