(DP) Support Tip: DP 9.x - Backup or GUI error pointi to need to increase max_locks_per_transaction

EDIT 25/May/2018 :

The root cause of this issue had been identified, please use correct solution instead of increase of the locks.

Issue is fixed in following ways :

* DP A.09.08_b113 and above

* Via Hotfix / SSP QCCR2A70420_DP907_r55110

Only DP versions A.09.07, DP A.09.07_b110 and DP A.09.08 are affected.

 

Backup may fail with this error :

[61:4001] Error accessing the database, in line 1909, file /svnstore/unix/panther/dp_907_rel_nightly/src/sm/bsm2/bcsmutil.c.
Database subsystem reports:
"Internal error: DbaXXXX functions."

GUI may fail with DbaXXXX error accessing devices & media or related tasks

 

Justification :

In the pg_log/* files, one or several of similar lines woud appear :

=================================================

2016-09-26 09:00:15 CEST WARNING:  out of shared memory

2016-09-26 09:00:15 CEST ERROR:  out of shared memory

2016-09-26 09:00:15 CEST HINT:  You might need to increase max_locks_per_transaction.

=================================================

Recommendations :

In the postgresql.conf file, the default limit for max_locks_per_transaction = 64.

A review by the CPE IDB / Postgres team shows that this is not adequate to the DP demand and so both Proactive and Reactive recommendation is to increase this value.

Small environments :
max_locks_per_transaction = 1024

Medium size and large environments :
max_locks_per_transaction = 4096

NOTE :

max_locks_per_transaction = 1024 will occupy additional 28MB of memory
max_locks_per_transaction = 4096 will occupy additional 111MB of memory

- comparing to default 64 (if max_connections = 100)

If you unsure, better set the 4069, it just occupies some additional memory

 

IMPORTANT NOTE : Make a copy of postgresql.conf before you make any changes to it

If this parameter is not set in postgresql.conf (or it is commented out with #), just add the above line to the end on postgresql.conf file.
If this parameter is already set, then make sure it set at least to the above recommended value, if not, change to recommended value.

The config fle is located under following directory (default) of your CM:

WINDOWS : \ProgramData\OmniBack\server\db80\pg\postgresql.conf
UNIX : /var/opt/omni/server/db80/pg/postgresql.conf

This parameter will only became active on the DP services restart.

  • IMHO, this solution works great, but is not complete. The legacy is the logging of autovacuum of PostgreSQL. I get spammed with lines like:

      2017-04-10 10:23:38 CEST LOG:  autovacuum: found orphan temp table "pg_temp_9"."id_95711486382101581" in database "hpdpidb"

    The same ID's over and over again. Although I don't have proof that the 33GiB of logging is caused by max_lock_per_transaction being to small, I am convinced that it is: logging started shortly after the first GUI error and it is just like the GUI error ("PL/pgSQL function "set_temporary_variable" line 17 at SQL statement") about temporary table/storage.

    I created the following (Linux/bash) one-liner to create necessary drop statements:

      grep 'autovacuum: found orphan temp table' -- "$(ls -1t /var/opt/omni/server/db80/pg/pg_log/postgresql-201*.log | head -n 1)" | grep -E -o '"pg_temp_[0-9] "\."id_[0-9] "' | sort | uniq | while read TEMPTABLE; do echo '/opt/omni/sbin/omnidbutil -run_script <(echo '"'DROP TABLE ${TEMPTABLE}'"') -detail -admin'; done

    It checks the latest PostgreSQL logfile for orphaned temp table logging, extracts the name of the temp table and produces a one-liner to delete that temp tables. Once I felt safe actually executing them, I saved the output and ordered bash to executes all those lines.

     

    But that only stops the excessive logging. For actually reclaiming some space, I went with gzipping all logfiles over 10MiB:

      find /var/opt/omni/server/db80/pg/pg_log -maxdepth 1 -name 'postgresql-201*' -type f -size 10M -exec gzip -9 {} \;

  • HI Alex,

     

    Will below fix will work for below error. I could see  line number is different. Please check and help us. Thanks in advance!!!

     

    61:4001]             Error accessing the database, in line 1718, file ..\brsmutil.c.

                    Database subsystem reports:

                                    "Internal error: DbaXXXX functions."

     

    PG Log:

    2018-05-25 00:20:43 MDT LOG:  automatic vacuum of table "hpdpidb.hpdpidb_app.dp_management_session": could not (re)acquire exclusive lock for truncate scan

  • Hi ,

    What version of Data Protector are you running (omnicheck -patches on Cell Manager)? 

    Regards,
    Sebastian Koehler

  • Patch level         Patch description

    ===========================================

    DPWIN_20234 (A.09.09_115) Core Component

    DPWIN_20234 (A.09.09_115) Core of Integrations component

    DPWIN_20235 (A.09.09_115) Cell Manager Component

    DPWIN_20236 (A.09.09_115) Disk Agent

    DPWIN_20237 (A.09.09_115) General Media Agent

    DPWIN_20238 (A.09.09_115) User Interface

    DPWIN_20230 (A.09.09) HPE 3PAR VSS Agent

    DPWIN_20210 (A.09.09) HPE P6000 / HPE 3PAR SMI-S Agent

    DPWIN_20232 (A.09.09) MS SharePoint Server 2007/2010/2013 Integration

    DPWIN_20224 (A.09.09) MS SQL Integration

    DPWIN_20217 (A.09.09) MS Volume Shadow Copy Integration

    DPWIN_20211 (A.09.09) DB2 Integration

    DPWIN_20227 (A.09.09) Lotus Integration

    DPWIN_20223 (A.09.09) Oracle Integration

    DPWIN_20228 (A.09.09) PostgreSQL Integration

    DPWIN_20233 (A.09.09) SAP R/3 Integration

    DPWIN_20240 (A.09.09_116) Virtual Environment Integration

    DPWIN_20206 (A.09.09) Automatic Disaster Recovery

    DPWIN_20215 (A.09.09) StoreOnce Software Deduplication

    DPWIN_20241 (A.09.09_116) VMware Granular Recovery Extension Agent

    DPWIN_20205 (A.09.09) English Documentation (Guides, Help)

    DPWIN_20234 (A.09.09_115) Pegasus Libraries

    DPWIN_20234 (A.09.09_115) Core Technology Stack

    DPWIN_20235 (A.09.09_115) Cell Server Technology Stack

    DPWIN_20235 (A.09.09_115) Application Server Technology Stack

    DPWIN_20235 (A.09.09_115) Web Services

    DPWIN_20235 (A.09.09_115) Java Runtime Environment Technology Stack

    DPWIN_20235 (A.09.09_115) Job Control Engine Service Dispatcher

    DPWIN_20235 (A.09.09_115) Job Control Engine Service Registry

    Number of patches found: 29.

    Generated at: 2018-05-25 02:15:56

  • HI Sebastian,

     

    Just few hours back i worked with alex and  found warning related to  time out from hpdp-idb-cp.log 

    2018-05-24 22:05:47.300 1708 WARNING C-00000000017C8910: hpdpidb/hpdpidb_app@127.0.0.1:53419 Pooler Error: query_wait_timeout

     

    Then i remebered that last night we had VC upgrade to 5.5u3 and platform team was cloning this VC and i remembered  our cell manager is VM and  that would have caused the issue. 

     

    As of now backup is going good if i see any issue will keep you all posted.

     

    Thanks a lot for your help :)

  • Hi ,

    Thanks for the update.

    I have a comment on the patch information you've shared. This looks like you have all kind of Online Integration modules installed on the Cell Manager. This is usually not necessary. I would recommend to remove any not needed packages from Cell Manager to avoid any license related issues or GUI slowdown. This has no impact of push installations from the Installation Server.

    Regards,
    Sebastian Koehler