Active Directory Password Troubleshooter Tool - Part 2

1 Likes
In part 1 of this article series I talked about the new Password Troubleshooter tool that comes with IDM 4.5. Over the years we have mocked Novell, then NetIQ for what is known as stealth marketing. This is one of those examples. Novell Technical Support (NTS), those fine people you call when you are having trouble with the Identity Manager product (Whose names I know far too many of. Better if I had never called them, but that is life in the software world. Though it is fun when I visit Provo to ask and go up to speak with them to say hello) have been troubleshooting Password Synchronization issues with Active Directory for more than a decade now. Internally they started writing their own tools to make their lives easier, and finally in IDM 4.5 they released it with the product. In the past, if you called NTS with an issue with Password synchronization they would probably have sent you this tool to use. Now it comes with it.

Why did I call it stealth marketing? Well I only know about it because I was delving deep into the new features in IDM 4.5 for a series of articles on the topic, and that is how I found it. Did anyone else reading this article know that it was added in 4.5? Ya, that is what I thought. I am not sure that is the best model.

If you are interested in my articles about what is new in IDM 4.5 and the 4.5 SP1 patch you can read these. (Yes, I concede it is getting a bit verbose, and I am not quite done yet, still much more to go on 4.5 SP1 but they did add a lot of nice stuff in IDM 4.5 that they did not really call out.)

IDM 4.5:



Designer 4.5:





For the Password Synchronization troubleshooting tool, see the first episode in this series to see how to use it, and how it writes out logs that you can understand.

Now as it happens, I used this the first time, since I had a bizarre problem where synchronization was just not working. Being the arrogant, trained IDM dude that I am, I knew I could solve this on my own. But everything I tried, nothing was working. Finally I turned to the Password Sync diagnostic tool.

I ran through what it found, initially the only thing I found was a permission problem, but that turned out not to be the real issue. Once I resolved the permissions issues I still was not seeing any reported issues.

I knew that the PWFilter module talked to the Remote Loader server running the driver via RPC (Remote Procedure Calls). We knew that this network had tons of firewall rules that were likely blocking most things between networks and around the network. So the fact the tool confirmed that it could talk RPC from the Remote Loader server to the PWFilter on each DC at least eliminated that problem.

Now in the end, this turned out to be something of a negative hint. That is, the remote loader trace were NOT showing things it should show. It was not showing an error rather it was not showing something important. This is a very hard case to find, since you are not seeing what you should, and you need to know what to show in order to know it is missing. This is why I like error message articles and also why I like writing articles that show what it looks like on success. If you do not know what success looks like, how can you identify when something like this is missing.

In the end, my issue was sort of known, sort of documented, but not specifically with the reason why. This is why it was so hard to figure out. The way Password Sync works, requires that machine talking RPC to the PWFilter modules be a member server (or DC) in the same domain. Therefore, by definition, you can only have password sync from one Domain per Remote Loader server, and it has to be the domain the Remote Loader server is a member of (Dangle you participle, I don't care!).

In my case, the troubleshooter logs showed, I actually had 4 AD drivers running on this Remote Loader server. This was primarily because I had 3 domains I was using to synchronize user data, and the fourth was to allow the PowerShell service to be used, in a standalone queue. Before the advent of IDM 4.5 and it's Out of Band synchronization feature, there was no easy way to make events jump the queue if you thought them more important. (Well you could do it via policy, but it had to get to the policy first, to then jump the queue, so chicken, meet egg, and perhaps start cooking).

Thus to solve a timing issue, this customer had standalone PowerShell drivers so that regardless of the state of the main AD Driver queue for each of 2 domains, events that needed to happen via PowerShell could be sent into a separate queue. With the advent of IDM 4.5 and Out of Band sync, we decided to also prioritize within this queue, but that is just a nice add on.

The other problem is that only one PowerShell service per Windows server can connect to a single Remote Loader. So if you had 2 or more Active Directory domains that you wanted to connect and send PowerShell commands via the PSExecute pseudo attribute then, you need to run two Windows servers, each with an Active Directory driver and PowerShell service installed. So in our case, the proliferation of Windows Remote Loader servers would start to get silly. We would need at least three. One for a standalone Password Sync driver only. One each for the two PowerShell services. Now if we want to fail over and have backup Remote Loaders we would need 6. Even in the world of VMs that starts to get a bit silly.

As a side note, enabling Out of Band sync is really easy. In the driver filter, Designer and iManager have a new button to enable it that adds an XML attribute priority-sync="true" for attributes so flagged. Of course Priority Sync is a name already 'taken' by the eDirectory feature that allows for faster synchronization of 'special' attributes you wish to flag. So they had to rename it to Out Of Band sync. I spoke to one of the developers of Priority Sync in eDir who is now working on IDM and he really liked the Priority Sync name, but alas that would generate too much confusion among people, so the name had to change.

But back to my specific Remote Loader server. We finally eliminated the impossible, meaning everything else was just improbable.

I actually forget now, what the final clue was that triggered all this. I think it was just stopping all 5 remote loaders, and restarting just the one I wanted which suddenly started working. I tried loading the others, continued working. This surprised me. Magic fixing! Then I figured, well once is a magic trick, twice is a show, three to get ready and four to go? No nothing like that, I just stopped my password syncing remote loader and restarted it, but this time it failed. So what was the difference? Only the order was different as far as I could tell. This ended up being the key. In speaking with support it has to do with how the RPC Listener/Server connects. The first Remote Loader instance to try, grabs the RPC connections, even if not using it and won't release it. This means we need to start the Password Syncing Remote Loader first always or else it won't work.

Once we did this we got a working trace that I will walk through in a minute. Now of course we could have separated this Remote Loader to a server of it's own, but since we had the two PowerShell services, moving this to yet another server seemed a bit much.

The first thing we tried was setting the other Remote Loader services as Delayed Start.

Looking it up on Microsoft's site, I found this note about using Delayed Start.

Delayed Start:

 The Registry subkeys for services are located in the following path and can control how services are loaded.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\<Service name>

sc config [service name] depend= <Dependencies(separated by / (forward slash))>


This allowed for a dependency, but really we needed to be sure the first driver was started and connected before the others start up. We tried setting all the other services to Delayed Start instead of Automatic in the Services Control Panel, but that only buys us a 2 minute delay, in the midst of all the traffic of a startup. In the end, we decided on a script that would load the other 4 services, 5 minutes after the server started up successfully. This bought us enough time to ensure our service connected first and owned the RPC service it needed.

Now as I noted, it is hard to know what you should be looking for when it is just plain absent.

Of course, your GCV's that control Password Synchronization should be set to receive passwords from the application on the Publisher channel.

Let's dissect the trace of a successful driver start so you can see what the elements of interest are:


Driver Start:

	DC=dev,DC=acme,DC=edu
CN=Configuration,DC=dev,DC=acme,DC=edu
CN=Schema,CN=Configuration,DC=dev,DC=acme,DC=edu
DC=DomainDnsZones,DC=dev,DC=acme,DC=edu
DC=ForestDnsZones,DC=dev,DC=acme,DC=edu
default naming context: DC=dev,DC=acme,DC=edu
schema naming context: CN=Schema,CN=Configuration,DC=dev,DC=acme,DC=edu
configuration naming context: CN=Configuration,DC=dev,DC=acme,DC=edu
root domain naming context: DC=dev,DC=acme
DirXML: [06/16/15 21:19:45.21]: ,DC=edu
forest functional level: Windows Server 2003 Forest Mode


First up we see the naming contexts found from the rootDSE (all LDAP drivers do this). We see the schema, configuration, and root domain found, as well as being told the Functional level of the forest. 2003 in this case. There are later forest levels, but all servers need to be ready for it first. The Active Directory shim does not really care that much about forest functional level, so long as it supports it. Currently the latest level is supported by the latest shim which is about all you can ask for. AD uses this as sort of a versioning model for what Schema to expect.

	DirXML: [06/16/15 21:19:45.21]: ADDriver: Connect using ldap_bind: user=SRV_NETIQ, domain=DEV, password=***, method=negotiate, server=addev1.dev.acme.edu, sign=no, seal=no ssl=yes
DirXML: [06/16/15 21:19:45.24]: ADDriver: ldap_bind connection succeeded



This is important, it tells use we connected over SSL, not using Signing or Sealing. In order for passwords to work, the Remote Loader to Active Directory connection has to be secured with SSL. The certificate for this comes from the Active Directory Certificate Authority, and is set to allow LDAP over SSL connections. The connection between the engine and the remote loader should be secured with SSL as well. The certificate for this is from the eDirectory CA and that is the CA Public key in base 64 format that you called out in the Remote Loader configuration.

	DirXML: [06/16/15 21:19:45.24]: ADDriver: [PWD 4784] PassSyncCache::PassSyncCache()
DirXML: [06/16/15 21:19:45.24]: ADDriver: [PWD] PwdCrypt::GetPublicKey()
DirXML: [06/16/15 21:19:45.24]: ADDriver: [PWD] PwdCrypt::CreateKeyContainer()
DirXML: [06/16/15 21:19:45.24]: ADDriver: [PWD] PwdCrypt::GetCspName()
DirXML: [06/16/15 21:19:45.24]: ADDriver: [PWD] PwdCrypt::GetCspName() returned 0X00000000
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PwdCrypt::CreateKeyContainer() returned 0x00000000
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PwdCrypt::GetPublicKey() returned 0x00000000
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PwdCrypt::CreateKeyContainer()
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PwdCrypt::GetCspName()
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PwdCrypt::GetCspName() returned 0X00000000
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PwdCrypt::CreateKeyContainer() returned 0x00000000
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD 4784] PassSyncCache::CacheInit()
DirXML: [06/16/15 21:19:45.26]: ADDriver: PassSyncCache::StartInitDomainThread()
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD 4784] PassSyncCache::StartInitDomainThread() returned 0x00000000
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD 4784] PassSyncCache::CacheInit() returned 0x00000000
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PasswordSync::PasswordSync() hToken = 0x00000000
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PasswordSync() - initializing using MS RPC.
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PassSyncRPC::PassSyncRPC()
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PassSyncRPC::StartServerRpc()
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PassSyncRPC::StartServerRpc() returned 0x00000000
DirXML: [06/16/15 21:19:45.26]: ADDriver: Connect using ldap_bind: user=SRV_NETIQ, domain=DEV, password=***, method=negotiate, server=addev1.dev.acme.edu, sign=no, seal=no ssl=yes
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD 4340] - InitializeDomainInfo()
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] PassSyncRPC::InitializeRpcServer()
DirXML: [06/16/15 21:19:45.26]: ADDriver: [PWD] InitializeRpcServer() - waiting to run...


I will not dissect that too much, just to say, it was these trace items I was missing, but specifically the lines about the StartServerRPC() returned an 0x0000000 message of success.

Before it would not show these lines. I would have preferred if the driver had shown an error message here, in which case I would have zoomed on it right away, but it did not. Instead it did not show anything.


Another useful trace to see is what happens when a password changes:

DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PassSyncPasswordnew()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] - GetDCKey()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] - GetDCKey() returned 0x00000000.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::CreateKeyContainer()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::GetCspName()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::GetCspName() returned 0X00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::CreateKeyContainer() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] PassSyncCache::StorePwdInfo()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - open the cache.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - acquire the mutex.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - mutex acquired.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - enumindex 1.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - create the entry JSMITH.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - persist the data.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - timestamp the entry.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] PassSyncCache::StoreCurrentTimeStamp()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] PassSyncCache::StoreCurrentTimeStamp() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - signal the event telling that data exists in the cache.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - release the mutex.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - mutex released.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] StorePwdInfo() - close the cache
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 644] PassSyncCache::StorePwdInfo() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PassSyncPasswordnew() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PasswordSync::DataEnum()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PasswordSync::DataEnum().... checking that RPC Server is listening
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PasswordSync::DataEnum().... checking that RPC Server is listening
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::GetPwdInfo()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::GetPwdInfo() Looking for specific Username[(null)]
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - open the cache. Key = SOFTWARE\Novell\PassSync\Data\DEV.ACME.EDU
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - acquire the mutex.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - mutex acquired.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - get number of registry keys.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - dwSubKeys[1] dwPrefMaxEntries[-1] *lpdwResumeHandle[0] lpszUserName[(null)].
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - allocate return buffer to hold 1 entries.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - get the next entry.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - get password and time for user JSMITH.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::GetPasswordInformation()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPasswordInformation() - open the cache entry.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPasswordInformation() - Registry UserEnumIndex[1] Passed EnumIndex[0].
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPasswordInformation() - read the cache data.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] - PassSyncCache::QueryValue()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] QueryValue() - allocate 140 byte buffer
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] QueryValue() - read the data.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::QueryValue() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] - PassSyncCache::QueryValue()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] QueryValue() - allocate 514 byte buffer
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] QueryValue() - read the data.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::QueryValue() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::DecryptObjectPassword()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::CreateKeyContainer()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::GetCspName()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::GetCspName() returned 0X00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::CreateKeyContainer() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::DecryptObjectPassword() obtained handle successfully
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::DecryptObjectPassword() obtained session key successfully
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PwdCrypt::DecryptObjectPassword() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPasswordInformation() - close the cache entry.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::GetPasswordInformation() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::StoreCurrentTimeStamp()
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::StoreCurrentTimeStamp() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - save enumeration to reg.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - release the mutex.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - mutex released.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - close the cache.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - acquire the mutex.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - mutex acquired.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::GetPwdInfo() Read the following users from driver cache******
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436]PassSyncCache::GetPwdInfo() userName[0] JSMITH
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - release the mutex.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] GetPwdInfo() - mutex released.
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD 3436] PassSyncCache::GetPwdInfo() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: ADDriver: [PWD] PasswordSync::DataEnum() returned 0x00000000
DirXML: [06/16/15 21:14:13.58]: Loader: Received document from publicationShim
DirXML: [06/16/15 21:14:13.58]: Loader: XML Document:
DirXML: [06/16/15 21:14:13.58]: <nds dtdversion="2.2">
<source>
<product build="20150311_120000" instance="\ACMEDEV\AcmeDev\Servers\DRIVERSET01\Active Directory University" version="4.0.1.0">AD</product>
<contact>NetIQ Corporation</contact>
</source>
<input>
<modify-password event-id="Active Directory University##14dff14a8cd##0" class-name="user" src-dn="CN=JSMITH,OU=People,DC=dev,DC=acme,DC=edu" password-admin-reset="true">
<association>3159dae838a73a47a97263812d477842</association>
<password><!-- content suppressed --></password>
</modify-password>
</input>
</nds>


I am not sure there is much to say about this part of the trace. As you can see there is a lot of communication going on between the PWFilter server and the Remote Loader server over RPC. They connect and transfer information between them. The password is written to Registry keys and you can see the user name it found (JSMITH) and that the final XDS document of a <modify-password> event is generated onto the Publisher channel.

As you can see, when it works it is pretty clear and straight forward. Where the hard part shows up is when there are no errors and just some of the trace is not there that would tell you what the problem really looks like.

I hope this was helpful to anyone troubleshooting Active Directory password synchronization with Identity Manager, I know I learned a fair bit in the process.

Labels:

How To-Best Practice
Comment List
  • With regards to the RCP port grabbing. As I was suggested by support: setting the "Password Sync Timeout" to '0' will disable the publishing of password and the RCP port grabbing. I did the following test with success:
    * 2 AD drivers on one remote loader.
    * AD driver A has password sync
    * AD driver B has no password sync. I configured "Password Sync Timeout" to '0'
    * Start AD driver B first, then start AD driver A
    * Change your AD password. Password is synchronized via driver A.
  •  
    Hi Geoffrey,

    Thanks a lot for the detailed problem description. I worked with a partner from Belgium on an issue similar like the one you describe where he had one driver doing password sync and others that didn't, all configured in a single host.

    What we figured out is that if you set the parameter "Password Sync Timeout" to 0, the driver doesn't attempt to start the password sync functionality. This should be sufficient to avoid the other drivers that will not do password sync from taking hold of the RPC connection that the one driver actually needs to perform password sync.

    Best regards,

    Rodrigo
Related
Recommended