[OO Tip] Central Terracotta Database Corrupt and Service Unable to Start

Issue:

An error similar to the following appears in the %ICONCLUDE_HOME%/Clustering/terracotta/terracotta-data/server-logs/terracotta-server.log file:

 

2011-01-17 09:43:26,210 [WrapperStartStopAppMain] ERROR com.tc.server.TCServerMain - Thread:Thread[WrapperStartStopAppMain,5,main] got an uncaught exception. calling CallbackOnExitDefaultHandlers.

com.tc.objectserver.persistence.sleepycat.TCDatabaseException: Environment invalid because of previous exception: com.sleepycat.je.log.DbChecksumException: (JE 3.3.74) Location 0x12/0x7f4423 expected 1755551836 got 421290810

 

OR

 

2011-01-17 09:43:26,240 [WrapperStartStopAppMain] ERROR com.tc.logging.ThreadDumpHandler - Mon Jan 17 09:43:26 CST 2011

 

In addition please see the KCS article for further WARNINGS which might also follow this issue.

 

Solution:

This error is thrown when the Terracotta database becomes corrupted. The solution is to remove the corrupted Terracotta database, and allow Terracotta to rebuild a clean one.

 

1. Stop all Operation Orchestration related services across the entire cluster:

  • RSGridServer (The OO Clustering Service)
  • RSCentral (The OO Central Service)
  • RSJRAS (The OO Remote Access Service)
  • RSScheduler (The OO Scheduler)
  • RSCluster (The OO Load Balancing Component)

     Note: Include all Centrals, any standalone RAS, and any standalone Cluster components.

 

2. Backup, copy, or rename the current databases, which are found at %ICONCLUDE_HOME%/Clustering/terracotta/terracotta-data on every server running the RSGridServer service.

 

3. Delete every instance of the Terracotta database across the cluster by removing %ICONCLUDE_HOME%/Clustering/terracotta/terracotta-data from every server running the RSGridServer service.

 

4. Restart the OO services on the first node of the cluster, in the preferred order below:

  • RSGridServer (The OO Clustering Service)
  • RSCentral (The OO Central Service)
  • RSJRAS (The OO Remote Access Service)
  • RSScheduler (The OO Scheduler)
  • RSCluster (The OO Load Balancing Component)

5. Once the first node is up and healthy, bring up each additional cluster node in turn.

 

Please review the KCS article as it has a complete log entry for this error:

http://support.openview.hp.com/selfsolve/document/KM1042353

Tags: