Absent Member.. JasonCantrell Absent Member..
Absent Member..
48 views

[OO Tip] Terracotta cluster database has become corrupt

Problem:

 

At least one node in an Operations Orchestration (OO) cluster starts and immediatelly stops with the following error in %ICONCLUDE_HOME%\Clustering\terracotta\terracotta-data\server-logs\terracotta-server.log

 

[WrapperStartStopAppMain] WARN com.tc.objectserver.persistence.sleepycat.DBEnvironment - Unable to open DB environment. (JE 3.3.74) Read invalid log entry type: 54 Retrying after 500 ms

 

In the Windows Event Viewer the following event will be logged:

The RSGridServer service terminated with service-specific error The system cannot find the file specified.

 

The reason for this error is that the Terracotta Cluster Berkeley database (DB) that holds state and statistical data has become corrupt.

 

Resolution:

 

Steps to correct the issue:

1. Stop all Operation Orchestration related services on all Central nodes:

RSGridServer (The OO Terracotta Clustering Service)

RSCentral (The OO Central Service)

RSJRAS (The OO Remote Access Service)

RSScheduler (The OO Scheduler)

RSCluster (The OO Load Balancing Component) – if present

Note: If the RSGridServer (terracotta) component on any stand-alone servers, they need to be stopped also.

 

2. Backup/copy the current terracotta database, %ICONCLUDE_HOME%/Clustering/terracotta/terracotta-data, to a backup directory from every server running the RSGridServer service.

 

3. Delete the instance of the Terracotta database by deleting %ICONCLUDE_HOME%/Clustering/terracotta/terracotta-data from every server running the RSGridServer service.

 

4. Pick one of the OO servers and restart the OO services on that node of the cluster, in the preferred order below:

RSGridServer (The OO Clustering Service)

RSCentral (The OO Central Service)

RSJRAS (The OO Remote Access Service)

RSScheduler (The OO Scheduler)

RSCluster (The OO Load Balancing Component) – if present

 

5. Allow time for the Central in #4 to completely initialize. Verify that you can logon to the Central webpage. Once confirmed that the first Central is up and appears to be running fine, start the OO services on the other Central nodes, starting services in the same order listed in #4, allowing time for each Central node to completely initialize before starting the next. I believe you have a 2 node cluster, so you would perform steps in #4 on the Central server you choose to be first, then the second Central server in step #5.

 

6. Check terracotta_wrapper.log to see if the errors are no longer occurring. If things look good, continue monitoring the Central servers and terracotta_wrapper.log to see if problems are resolved.

 

Direct link to the document here:

http://support.openview.hp.com/selfsolve/document/KM1208401

HP Support
If you find that this or any post resolves your issue, please be sure to mark it as an accepted solution.
Labels (2)
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.