eDirectory DSMASTER recovery in lab

I've got a rather simple eDirectory tree setup:
servers srv1, srv2 and srv3, all eDirectory 9.2.8 on SLES15 (virtual machines in Vmware)
There's only the root partition, srv1 has master replica, srv2 and srv3 r/w replicas, rolling forward logs are not activated.
I wanted to try a disaster recovery, in case all servers are lost.
So I backed up srv1 with dsbk as described at www.netiq.com/.../bn4jkts.html .


dsbk backup -f /root/[hostname].dsbk -l /root/[hostname].dsbk.log -b -t -w -e [nicipassword]


Then I set up a new VM in an isolated environment with the same os, eDirectory and ip address as srv1.
I set up a dummy tree, then first restored nici:

dsbk restore -f /root/backup/srv1.dsbk -l /root/restore_nici.log -e [nicipassword]

This did not raise any errors, then I restarted eDirectory and restored the tree:

dsbk restore -f /root/backup/srv1.dsbk -l /root/restore_tree.log -r -a -e [nicipassword] -o

|==================DSBackup Log: Restore================|
Log file name: /root/dsbk_restore_tree.log
Restore started: 2024-2-6'T15:32:52
Restore file name: /root/backup/srv1.dsbk
Restoring file /var/opt/novell/eDirectory/data/dsnici.bak
NICI RESTORE: "NICI Files have been Restored Successfully"
Starting database restore...
Restoring file /root/backup/srv1.dsbk
Server: \T=TREE\O=service\CN=srv2
   Replica: \T=TREE
      Status: ERROR = -626
Server: \T=TREE\O=service\CN=srv3
   Replica: \T=TREE
      Status: ERROR = -626
Error!: -626
Warning! Roll forward logs have been turned off and reset to the default location
Database restore finished
Completion time 00:00:05
1 Error!

/var/opt/novell/eDirectory/log/ndsd.log:
Command line   restore -f /root/backup/srv1.dsbk -l /root/dsbk_restore_tree.log -r -a -e XXXX -o
Processing command line
Log file name: /root/dsbk_restore_tree.log
Restore started: 2024-2-6'T15:32:52
Restore file name: /root/backup/srv1.dsbk
Restoring file /var/opt/novell/eDirectory/data/dsnici.bak
NICI RESTORE: "NICI Files have been Restored Successfully"
Starting database restore...
Restoring file /root/backup/srv1.dsbk
Error!: -626
Warning! Roll forward logs have been turned off and reset to the default location
Database restore finished
Completion time 00:00:05
1 Error!
DSBK error! -626

The error is expected, because the other servers srv2 and srv3 are not available in the lab.
But ndssat says "Failed to obtain a NetIQ eDirectory Server connection to srv1.O=novell.DUMMY or NetIQ eDirectory Server is not running"
ndsrepair -P says "The Directory Services Database is closed" and doesn't display any partitions.
How do I proceed from here to get a running eDirectory again? I can see, that there are RST files in /var/opt/novell/eDirectory/data/dib.
I think I have to activate the recovered dib and remove the other servers, also called DSMASTER recovery. But how?
I couldn't find this in the documentation, any hints or links are more than welcome!

  • 0  

    Out of the blue... something like

    dsbk restadv -o -k -l /path/to/logfile

  • 0  

    ...forgot to ask: are there only RST files in the dibdir?

  • 0 in reply to   

    Thank you for the tip! I had to add the parameter -v to your suggested command to skip verification, but now the tree is restored and working, as far as I can see.

    dsbk restadv -o -k -v -l /path/to/logfile

    I also removed the replicas of the other servers using ndsrepair -P -Ad and deleted all objects referencing the old servers using iManager.

    I had tested this some time ago in a single server setup, where these additional steps were not necessary.

  • 0 in reply to   

    no, the files from the dummy tree were there, too.

    (But now the problem is resolved, see my reply above)

  • 0

    I have a follow up question about DSMASTER recovery:

    I managed to restore the master server using

    dsbk restadv -o -k -v -l /path/to/logfile

    I also removed the replicas of the other servers using ndsrepair -P -Ad and deleted all objects referencing the old servers using iManager.

    Then I tried adding new server srv2 as rw replica to this tree (ndsconfig add ...). It looks, like its working, after some time the replica state changes from new to on.

    But ndsrepair -E on the added server shows this error (ndsrepair -T shows no errors):

    [1] Instance at /etc/opt/novell/eDirectory/conf/nds.conf:  idv.O=service.IMUP
    Repair utility for NetIQ eDirectory 9.0 - 9.2.8.0000 v40209.00
    DS Version 40209.00  Tree name: TREE
    Server name: .srv1.service

    Size of /var/opt/novell/eDirectory/log/ndsrepair.log = 11032 bytes.

    Preparing Log File "/var/opt/novell/eDirectory/log/ndsrepair.log"
    Please Wait...
    Collecting replica synchronization status
    Start:  Monday, February 19, 2024 03:23:28 PM Local Time
    Retrieve replica status

    Partition: .[Root].
      Replica on server: .srv2.service
      Replica: .srv2.service                    ********** ********
        Server: CN=srv1.O=service               02-19-2024 15:22:11  -609 Remote
          Object: [Root]
      Replica on server: .srv1.service
      Replica: .srv1.service                    02-19-2024 15:21:58

    Finish:  Monday, February 19, 2024 03:23:28 PM Local Time

          Total errors: 1
    NDSRepair process completed.

    Nevertheless replication from the master to the rw replica is working, but not the other way around.

    I tried forcing immediate synchronisation with ndstrace as mentioned at https://support.microfocus.com/kb/doc.php?id=7003102 :

    2024/02/19 15:40:56 Start partition sync .TREE. state:[0], type:[1].
    Sync - Start outbound sync with (#=4, state=0, type=0 partition .IMUP.) .srv1.service.TREE..
    Negotiated max packet size 1048576
    Sync - using version 9 on server <.srv1.service.TREE.>.
    Sending to  ----> .srv1.service.TREE.
    Sync - sending updates to server <.srv1.service.TREE.>.
    Send Partition Updates started usingDispatcher=1
    Creating Async Queue with Length 6
    ComputeLowestCompareTime 0x65D0BD1B (2024/02/17 15:05:15, 1, 4660)
    Using Sync Point Type 2, for .TREE. to .srv1.service.TREE.
    Sync - [00008008] <.TREE.> [2009/04/03 15:58:05, 1, 1].
    Sync - [0000800e] <.admin.service.TREE.> [2009/04/03 15:58:05, 1, 80].
    Sync - [00008047] <.0_1.service.TREE.> [2016/02/24 14:11:41, 1, 1].
    Sync - [00008048] <.srv3.service.TREE.> [2016/02/24 15:33:17, 1, 1].
    Sync - [0000800b] <.srv1.service.TREE.> [2016/07/08 13:26:42, 1, 1].
    Sync - [0000800a] <.srv2.service.TREE.> [2024/02/19 15:13:44, 4, 1].
    Send Partition Updates ChangeCache processing completed in Seconds 4, in MilliSeconds 764 - Total objects 6 Total Changes 18 processed
    Adding packet 1 to queue
    Sending packet 1 to remote server - objects 6 Changes 18
    DCRequest failed, missing mandatory (-609).
    SYNC: Multiple packet Response for [00008047] <.0_1.service.TREE.>, failed, missing mandatory (-609)
    Time taken for send/receive of packet with size 2020, in Seconds 0, in MilliSeconds 3,  Error if any -609
    Dispatcher thread completed in Seconds 4, in MilliSeconds 767,  Error if any -609
    Send Partition Updates completed in Seconds 4, in MilliSeconds 768 - Total objects 6 Total Changes 18, 1 Packet(s) Sent
    Sync - objects: 6, total changes: 18, sent to server <.srv1.service.TREE.> for .TREE..
    Sync - Process: Send updates to <.srv1.service.TREE.> for .TREE. failed, missing mandatory (-609).
    EndUpdateReplicaReply - Number of replicas not found from TV 0

    As you can see, a reference to srv3 is displayed, but this server has not been re-added to the tree yet, only srv2. Also the object 0_1.service.TREE is unknown to me.

    So it was NOT enough to remove the objects referencing the old servers using iManager. How can I make the tree really 'forget' the old servers? I repaired the local database using ndsrepair -R, but that didn't change anything.

  • 0   in reply to 

    There are one or more collision objects in this NDS. The NDS is inconsistent and defective
    Sync - [00008047] <.0_1.service.TREE.> [2016/02/24 14:11:41, 1, 1].

    Normally only an NDS backliner can fix this.  If this still exist in  the original NDS, it is not capable of recovery in this state. I also assume that transitive vectors must be put in order and an earlier time jump or a synthetic time event must also be searched for..According to experience, there are still obituaries that also need to be cleared

    It is impossible to clarify such a situation via a forum, you have to have the NDS in your hand and know which traces and doings deliver the results. I myself have had such situations in the field several times and we then used the 4-eye principle to get the NDS working again. 

    I'm not a friend of TID or KM,The TID is actually intended for Novell NetWare, but explains the topic very simply


    support.novell.com/.../10062001.html


    Greetings George

    “You can't teach a person anything, you can only help them to discover it within themselves.” Galileo Galilei

  • 0   in reply to 

    If you look at the collision object (there might be others, too) with iMonitor, which attributes (and values) get shown? Especially look for "object class" (likely "unknown") and "unknown object class", but list all of them.

  • Verified Answer

    +1 in reply to 

    In the meantime I got a tip from support: in iMonitor the deleted objects were still listed with state 'Not Present' at the time, when I tried to add a r/w replica with the same name.

    So I started recovery from the beginning. This time, after deleting the objects referencing the old replica servers, I did Agent Configuration > Agent Triggers with different selections until these objects were not listed any more in iMonitor. Then I could add the r/w replicas with the same name without problems.