Error Codes for Identity Governance - Part 2
I think that collecting errors and discussing them can be pretty helpful in terms of any product. I have this dream of finding a product that has all possible errors documented with examples. One day, one day my dream shall be fulfilled. Not any time soon though.
Thus my current approach of collecting errors I run into with any particular product and saving them to write into an article like this.
You can see some examples of those I have done for IDM drivers in the past:
- Active Directory Driver Error Messages - Part 5
- Active Directory Driver Error Messages - Part 4
- Active Directory Driver Error Messages - Part 3
- Active Directory Driver Error Messages - Part 2
- Active Directory Driver Error Messages - Part 1
- Error Codes of the Novell Identity Manager Driver for JDBC: Part 1 of 4
- Error Codes of the Novell Identity Manager Driver for JDBC: Part 2 of 4
- Error Codes of the Novell Identity Manager Driver for JDBC: Part 3 of 4
- Error Codes of the Novell Identity Manager Driver for JDBC: Part 4 of 4
I was working with a customer and we had all sorts of issues so I collected all sorts of errors, so lets go look at some more.
For some reason, which I still do not know the root cause of, our centrally managed database just plain went away. Not sure why it was down, but it was, and in the end they had to restore from backup, which seemed a bit extreme. I thought figuring out what happened would be interesting but no one really seemed to show any interest or desire in doing the work, so it never happened. What can you do?
Database Connectivity Issues:
[SEVERE] 2018-07-05 16:40:51 com.netiq.iac.common.logging.IACLoggingUtils logExceptionError - [IG-SERVER] Failed to connect. URI: https://acme.com:8443/dtp/facts/collect?id=2dd85797-0d00-43db-a382-7adacc5cdeac&id=30626c29-b624-45ab-a43b-7cd50f355242&id=69fa9ed9-3262-4a3a-bdfe-5348bf7971eb&id=a5fe797c-67d7-4543-a1aa-5e7d0b142954&id=a95fd893-bafb-474c-b669-7d5852ad4fc3&id=d0693a40-97fd-419a-b204-714bc92d7fb4&id=d693f88a-3b59-40d0-b977-d62cbc23f633&id=d9896f0a-7715-46eb-a0c2-f91c230aba3d&id=e36f8f73-9758-43e3-b3bc-1fefddb4d19a&id=e3f2f491-8801-4cd6-ac74-8c2693ee4e05&id=ec422cf7-9895-43e4-9e3e-2ac3746bd25a, rest service id: dtp_server:FactExecutionService. Please verify that rest server is reachable.
[SEVERE] 2018-07-05 16:40:51 com.netiq.iac.persistence.dao.ara.FactBrokerDAO$CollectionThread run - [IG-SERVER] Failed to parse result set for rest call to https://acme.com:8443/dtp/facts/collect?id=2dd85797-0d00-43db-a382-7adacc5cdeac&id=30626c29-b624-45ab-a43b-7cd50f355242&id=69fa9ed9-3262-4a3a-bdfe-5348bf7971eb&id=a5fe797c-67d7-4543-a1aa-5e7d0b142954&id=a95fd893-bafb-474c-b669-7d5852ad4fc3&id=d0693a40-97fd-419a-b204-714bc92d7fb4&id=d693f88a-3b59-40d0-b977-d62cbc23f633&id=d9896f0a-7715-46eb-a0c2-f91c230aba3d&id=e36f8f73-9758-43e3-b3bc-1fefddb4d19a&id=e3f2f491-8801-4cd6-ac74-8c2693ee4e05&id=ec422cf7-9895-43e4-9e3e-2ac3746bd25a from 0:api_server:DPM service.
com.netiq.common.i18n.LocalizedException: Failed to parse result set for rest call to https://acme.com:8443/dtp/facts/collect?id=2dd85797-0d00-43db-a382-7adacc5cdeac&id=30626c29-b624-45ab-a43b-7cd50f355242&id=69fa9ed9-3262-4a3a-bdfe-5348bf7971eb&id=a5fe797c-67d7-4543-a1aa-5e7d0b142954&id=a95fd893-bafb-474c-b669-7d5852ad4fc3&id=d0693a40-97fd-419a-b204-714bc92d7fb4&id=d693f88a-3b59-40d0-b977-d62cbc23f633&id=d9896f0a-7715-46eb-a0c2-f91c230aba3d&id=e36f8f73-9758-43e3-b3bc-1fefddb4d19a&id=e3f2f491-8801-4cd6-ac74-8c2693ee4e05&id=ec422cf7-9895-43e4-9e3e-2ac3746bd25a from 0:api_server:DPM service.
What I think is interesting in the error is that the class that reports the error is com.netiq.iac.common.logging.IACLoggingUtils which is a child of course, of com.netiq.iac upon which I like enabling full logging. This was making a call to the /dtp endpoint, which I do not know the purpose of. There is a daas service, Directory As A Service I assume ( we see the daas.war deploy) but there is no dtp.war that deploys so it is a component of some other WAR I am sure.
Then we see the class com.netiq.iac.persistence.dao.ara.FactBrokerDAO$CollectionThread error, which seems likely related to the database persistence layer. I had enabled com.netiq.persist but there was so much trace for every database operation that it was unworkable to read and I had to disable it. That seems like it is the overall database persistence level, but this is a more specific database level inside the iac class. In this case it is reporting a error for the DPM Service, which I am not sure what that stands for either. You can see that some of these are errors talking to the REST endpoint.
The main IDG client is a REST based application and is constantly making REST calls to update the screen (Especially when you have a spinning wheel for a Collect or Publish, it just checks again and again, polling every few seconds. Thus it will notice fairly quickly when the service is down.
Tomcat, Address already in use:
[INFO] 2018-07-06 11:53:20 org.apache.coyote.AbstractProtocol init - Initializing ProtocolHandler ["https-jsse-nio-8443"]
[SEVERE] 2018-07-06 11:53:20 org.apache.coyote.AbstractProtocol init - Failed to initialize end point associated with ProtocolHandler ["https-jsse-nio-8443"]
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
This is technically not an IDG per se error, rather a Tomcat specific error, but it happened on my IDG server so it could happen to yours as well. I did not somehow properly stop the Tomcat service and managed to start a second instance. Now in theory SystemD, the replacement for InitV on Linux is supposed to check for this and not let this happen, and yet here we remain. I am beginning to really hate SystemD, I know some people love it but it seems like it causes more pain that it is worth, in my personal opinion. (I know this is actually not a SystemD specific problem it is the start script from NetIQ but still).
This was nice because it was obvious what the error was, and easy to troubleshoot. I did a 'ps -ef | grep tomcat' and lo and behold there were two instances. I 'kill'ed both instances and the started back up without a problem.
No extra logging needed to be enabled as this is a Tomcat level error.
Permission to modify log file failed:
[SEVERE] 2018-07-06 11:54:18 org.apache.catalina.valves.AccessLogValve open - Failed to open access log file [/opt/netiq/idm/apps/tomcat/logs/localhost_access_log.2018-07-06.txt]
java.io.FileNotFoundException: /opt/netiq/idm/apps/tomcat/logs/localhost_access_log.2018-07-06.txt (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
This is a pretty low level error as well, and again, more of a Tomcat error than an IDG error, but it too could happen to you.
I had accidentally assigned files ROOT ownership, but Tomcat is running as novlua which then could not access the files. This was easy enough to fix, I just changed the ownership back to novlua and Tomcat was happy with the log file.
Global.properties not properly imported:
[SEVERE] 2018-07-09 09:26:37 org.apache.catalina.core.StandardWrapperValve invoke - Servlet.service() for servlet [OSPMainServlet] in context with path [/osp] threw exception
When I initially installed IDG, the remote database on the MS SQL Cluster was not propery configured. The docs give you some SQL statements you can use to prepare it to create the needed DB's accounts, and assign some permissions. Thus all our database setup calls failed. What had happened was that after we created the databases and configured them with the Liquibase commands that are left in the log files, there are two additional scripts that need to be run.
They call the configutil.sh tool with a -script parameter which is used to process a script, that allows you to execute configutil.sh commands. The two scripts basically import the /opt/netiq/idm/apps/tomcat/conf/global.properties file, which has lots and lots of content, into the database as configuration information, into the ISM_GLOBAL_CONFIG table. In my case, ours was completely empty.
The second imports the OSP configuration, in the /opt/netiq/idm/apps/osp/conf/ig-something.properties into the main ism-configuration.properties file in the /opt/netiq/idm/apps/tomcat/conf directory that OSP actually loads and uses.
I am told that the reason some information goes in the ism-configuration.properties file (which I had tried copying the global.properties file content into, basing it on my IDM Identity Apps experience, but did not work) is that IDG pretty much always runs in cluster mode. Even if it is just a cluster of a single node.
Thus there is node specific stuff, for this box only, and in fact is needed to bootstrap to the point where the rest can be loaded, contained in the ism-configuration.properties file. Then there is cluster related stuff, that all the nodes share, which is stored in a shared location they all can access. In the case of IDG it is the ISM_GLOBAL_CONFIG table. Whereas in the case of the Identity Apps in IDM, it uses eDirectory as the shared common database and stores it in attributes of the configuration object under the AppConfig container in the User Application driver. (I think that is part of the reason, when you run configupdate.sh you are supposed to add a switch of edition=none to indicate not to use the User Application drivers location as storage. In fact, that is why if you do not specify the User App driver location, or it does not exits, configupdate.sh will fail to save, as it needs to write to the attribute holding XML Blob that holds all the config information.)
We got all sorts of errors that looked like this, all over the place. I wanted to get as many into the article as possible for the next foolish fellow who makes the same mistakes I made.
However, I am unsure why they configure these two other files by the installer, and then copy them into the useful locations, but this is how they do the install now. Maybe it makes backing out easier? A checkpoint in case you stop your install? Perhaps it knows that if this is the second node, it does not copy anything into the database? I have not tested that theory out, but if you happen to have noticed what happens in that case, please comment and let me know.
DAAS Error 487:
[SEVERE] 2018-07-15 20:16:21 com.netiq.iac.server.rest.ConnectionService testConnection - [IG-SERVER] Test Connection error: DAAS_ERROR: 487 : Target Authentication failure.
This one was fun and very specific to my customers very odd directory choices. This error was simply because the password I was using was incorrect and when I tried to login via the GUI it failed and threw this error. Not a problem and kind of boring. But what happened was, we wanted to use two trees, one sort of a holding area for all data sources. You see this often at universities where there may be a Student Information System for students, an HR system for employees, a fund raising system for Alumni, and then yet another system for contractors, part times, guests and so on.
Even in a simple University settings, 3-5 data sources for users is basically normal. (I worked with one that had 5, we added a 6th, and then dropped one, as they swapped around systems. Additionally you have a problem that often one user is in two or more of those systems. For example a Student, who graduated and came back for another course, while working at the University could be in three systems. As anyone who deals with data entry knows, errors happen all the time. Differences in naming happens. Thus you need some mechanism to decide how to handle the case that all three systems sent First and Last names, but they differ slightly.
Sadly, the easiest way is to decide on priorities, like HR makes sure you get paid. If you are not getting paid, you, the end user will complain and make sure it is correct. Then students, then alumni, then contrator systems. In this case, they actually made the atttributes multivalued, which caused all sorts of pain collecting into IDG since these common attributes are supposed to be single valued. (What sense does it make having a multivalued last name? You might have two or three last names, but the should be in a single string, not in three values, I would think). We fixed that, as you can see in the article series I wrote about IDG, part 3: https://www.netiq.com/communities/cool-solutions/getting-started-identity-governance-part-3/
To summarize, we added an ECMA function to the fields that had multiple values to transform them. We looked at the inputValue variables first character. If it was an open square bracket ([) we JSON parsed it, since it was an array and took only the first value. If not, we accepted the value as is.
Check out that article for the gory details on how we figured that all out.
Back to the issue at hand. The system with everybody, did not actually have passwords we knew. And we were trying to authenticate to that directory. What made it even funnier is that the password policy was very wierd (designed to help generate easy to use initial passwords) and I ended with an astonishingly weak password trying to make it accept the password.
However the passwords we knew were in a different tree, and we were not authenticating against that tree at all. Those were going to be considered Accounts, not Identities.
No Identities imported, no one has admin role, but can log in:
[WARNING] 2018-07-16 15:39:08 com.netiq.iac.server.j2ee.AuthFilter doFilter - [IG-SERVER] User Geoffrey Carman (uid=gcarman,ou=PEOPLE,dc=acme,dc=com) is authenticated and logged in, but does not have access to the Identity Governance application.
We actually saw this error twice with the same root cause but different circumstances. It is worth discussing both since either could happen to you.
The first was once we got IDG up, running and the web interface let us log in as the Bootstrap Admin (which uses a text file for the username/password) I immediately tried to login as myself to prove the LDAP authentication was working. It was, however we had yet to import Identities so my account was not mapped to anything or any IDG Permissions, so I get the error that I do not have access to the application, which is correct.
Then after we got the password configured and better understood what we were doing we saw this one happen. I was able to sign in, no longer as Bootstrap Admin, but instead as a real user, but our OSP configuration for IDG was pointed at the wrong tree so the Administrator permissions we had granted were not on this user, rather the corresponding user in the other tree.
It was nice because this helped us realize the problems we had caused for ourselves.
The Bootstrap Admin account approach is interesting as if you had read my series on reading OSP logs, you might have noticed a couple of places were the OSP in Identity Manager makes reference in the log files to using a fil efor authentication. But it was never exposed or setup for us to use. In the case of IDG it is configured out of the box, and is your first and only, initial login option.
You use the Bootstrap account to configure collecting Identities, which once in the system, can be assigned permissions within the application (Via the Administraton tab and the Administrators section) but then you are supposed to log out and come back in as one of those permissioned users, with authentication happening against the LDAP directory of your choice.
Then if you ever lose LDAP connectivity, you can always come back and log in as the Bootstrap admin to try and fix the system. It is nice having this option as in the Identity Apps case in IDM I have seen times where we need to fix the LDAP components, but since it reads information via LDAP, it has a hard time gettings it all to work and connect. This approach makes it much easier.