Trusted Contributor.. PeterM1977 Trusted Contributor..
Trusted Contributor..
506 views

PPM 9.32.0001 random nodes crash in clustered Prod env.

Hello ...

since upgrade of PPM to version 9.32.0001 in April we are experiencing random nodes crashes almost every week. PPM is running on Windows servers 2008 R2 as a Windows service using Java jdk1.7.0_75. Everything worked fine in 9.31.0002 even if the UI was slower.There aren't any useful information in the logs. This issue happens only on Production clustered env. QA servers configured same way - without issues - minimum load.

As a known issue in 9.32 were the missing files during startup. I copied them from 9.31 backup. I believe this is not the cause.

Any help would be appreciated ...

Thank you

Peter

0 Likes
11 Replies
Natalia_R_PPM Absent Member.
Absent Member.

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

Hello Peter,

Maybe with the debugging lines , you will find more useful information in the logs

When a PPM server node is not starting up and there is no clear error message in the ServerLog.txt,  you can add the following lines to the logging.conf   ,  this file is located under  PPM_HOME/conf folder.

Changes in logging.conf do not require a server re-start,  it will start debugging automatically.

Here are the steps:

1-Add the following debug log settings to logging.conf:

 com.kintana.core.logging.PRODUCT_FUNCTION_LOGGING_LEVEL = com.mercury.itg.core.jms.service,DEBUG

com.kintana.core.logging.PRODUCT_FUNCTION_LOGGING_LEVEL = com.mercury.itg.core.server.mdServices,DEBUG

com.kintana.core.logging.PRODUCT_FUNCTION_LOGGING_LEVEL = com.mercury.itg.core.scheduler,DEBUG

com.kintana.core.logging.PRODUCT_FUNCTION_LOGGING_LEVEL = com.mercury.itg.core.monitor.impl.ServerManagerImpl, DEBUG

com.kintana.core.logging.PRODUCT_FUNCTION_LOGGING_LEVEL = com.mercury.itg.core.jms.service.impl,DEBUG

com.kintana.core.logging.PRODUCT_FUNCTION_LOGGING_LEVEL = org.quartz.core,DEBUG

 

 

(You can paste these lines at the end of logging.conf)

 

2-In addition to these, the SYSTEM_THRESHOLD setting in logging.conf should also be set to DEBUG.

 

3-Then try to start the node using command line with  kStart.sh -debug

 

4-Check the error in the ServerLog.txt  found under  PPM_HOME/server/server_name/logs folder

 

5-Search for the PPM start up lines and see where it is hanging or failing to start up.

 

6-The output should have more details than regular server loggging.conf,  you can use to troubleshoot or to contact HP Support with this particular log.

 

STEPS to disable the debug level:

1-After gathering the logs, the debug settings can be disabled simply by setting SYSTEM_THRESHOLD back to ERROR.

2-Put in comments the debug lines or just removed them from the logging.conf

NOTE:  It is not recommended to keep the SYSTEM_THRESHOLD in Debug mode all the time,  as it could cause performance degradation.   Use this only for debug purpuses.

Thanks

Natalia R

HP Support
If you find that this or any post resolves your issue, please be sure to mark it as an accepted solution.
0 Likes
Trusted Contributor.. PeterM1977 Trusted Contributor..
Trusted Contributor..

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

Hello Natalia

yeah I know how to setup DEBUG logging, but also I cannot restart PPM PROD anytime ... I have my own monitoring of nodes via What'sUP Gold where the crashed node/nodes are started back automatically ... there isn't any issue to start them back, but why they crash? As I mentioned, our PPM service is configured as a Windows service, so "kStart.sh -debug" cannot be used.

I know it's harder to troubleshoot without DEBUG logging ... I just wanna avoid high priority tickets due to possible performance degradation, timed-out processes / workflows etc ... this could be just a last option 

Thank you

Peter

 

 

 

 

0 Likes
Frequent Contributor.. hyllplan Frequent Contributor..
Frequent Contributor..

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

We had chrashing server nodes twice, both times was because of HEAP settings in server.conf.  Old bug that you need to set it on every node not only in cluster (dont know if its still applicable). Also it could depend how you should setup HEAP depending on many factors, but its hard to troubleshoot in prod. You can not put a test environment under heavy stress test and see if you can generate the issue?

About "As a known issue in 9.32 were the missing files during startup. I copied them from 9.31 backup. I believe this is not the cause." Whats that for issue?, were upgrading in a week....:)

0 Likes
Trusted Contributor.. PeterM1977 Trusted Contributor..
Trusted Contributor..

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

Hey 🙂

I will check the heap, but there is no java OutOfMemory error in the logs ...

regarding to missing files ... I was unable to start PPM after upgrade to 9.32.0001 due to missing files: ppm_dm-web.war, ppm-tm-hpa-web.war and ppm-tsapproval-hpa-web.war. The issue in patch 0001 seems to be corrected in new patch 0002 for 9.32 (released in April 18th) -> QCCR1L60030 - After applying PPM 9.32.0001, PPM server cannot be started if mobile website client is enabled (This issue is caused by missing files in the package. PPM server can now be started when mobile website client is enabled.) ... so try to apply the patch 0002 🙂

good luck

P.

 

 

0 Likes
Frequent Contributor.. hyllplan Frequent Contributor..
Frequent Contributor..

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

Look for hs_err_pid_[nnnn].log in ppm directory, thats how we found out, an error where not generated in serverlog at all times, seems like the process died before reporting

Trusted Contributor.. PeterM1977 Trusted Contributor..
Trusted Contributor..

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

Thanx Hyllplan

I almost forgot that this log is also there. It seems that you were right about the HEAP. I found the out of memory for Java Runtime env there. but also Exception_access_violation error (more often) where the crash happend outside of Java VM in native code. Maybe also related to Heap size. Our currecnt HEAP setting is = 1280m. I will increase the value to 2048m.

There is other recommendation to set a larger code cache with  -XX:ReservedCodeCacheSize= ... we don't have this parameter set. Are you using this?

Thanx

P.

0 Likes
Erik Cole Acclaimed Contributor.
Acclaimed Contributor.

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

Hi...we're experiencing random node crashes in 9.32.0002 as well. Look in that hs_err_pid_[nnnn].log for text like

Problematic frame: # C [libzip.so+0x8099] deflate_slow

...it seems there's a known issue with gzip and you might have similar...

0 Likes
Frequent Contributor.. hyllplan Frequent Contributor..
Frequent Contributor..

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

Hi all
Were going live with 9.32.0002 in two days. Dont want crashing nodes 😉 Could HP comment If this is a known error with nodes crashing?
Peter - we use the following settings for heap:
com.kintana.core.server.SERVER_INIT_HEAP_SIZE=2048m

com.kintana.core.server.SERVER_MAX_HEAP_SIZE=2048m

com.kintana.core.server.SERVER_MAX_PERM_SIZE=512m
We set it for each node, 3 usr nodes, one srv node
Dont know about the param where is it set? When executing ppm nodes? Server.conf?
Johan
0 Likes
Trusted Contributor.. PeterM1977 Trusted Contributor..
Trusted Contributor..

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

Hello Hyllplan, Erik

regarding Gzip ...

yeah I know about this issue and I changed GZIP_ENCODING_ENABLED to false in tune.conf (v9.31) a year ago ... i checked the current settings and I found that tune.conf was replaced after upgrade to 9.32, so this parameter is enabled again ... how sweet 🙂

Hyllplan,

If I understand your question correctly, the heap size can be set in server.conf. In our case we have it set in PPM Cluster Cluster-Specific Configuration section for each node. As I mentioned, I have to increase the HEAP sizes to 2048 (currently 1280) and the PERM size to 512 (currently 256).

Which version you currently have?

So both information are very useful, even if the gzip encoding were already "fixed" before, but it seems that we should check all modified files after any upgrades / patches to make sure, that we have all params set correctly.

Thank you, I will keep you posted

P.

 

0 Likes
Frequent Contributor.. hyllplan Frequent Contributor..
Frequent Contributor..

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

0 Likes
Trusted Contributor.. PeterM1977 Trusted Contributor..
Trusted Contributor..

Re: PPM 9.32.0001 random nodes crash in clustered Prod env.

yeah .. copied from Release notes ... 🙂

P.

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.