Interesting User App not starting error

0 Likes

Novell Identity Manager has two major components (well more, but you could argue there are these two at least) for managing the identity lifecycle. The first is the more traditional event driven policy side. That takes events from a remote system, filters, processes, and sends them to the Identity Vault. And the converse. Then there is the second component that handles more of user interaction, the Provisioning side. This allows a user to request access to a resource can handle approvals and what not. This has evolved into a much more coherent and comprehensive solution. In fact many of the new features coming in the next release of Identity Manager will be working through the Provisioning side.



The thing about Provisioning is that it is a really big space. There is so much going on with it. I tried to explain some of that complexity in this series of articles:






I am not sure I really covered it, since I missed Roles, Resources, Teams, and much of the DAL in that article series. Anyway, I think we need more articles on the product and some of the various issues involved in running it.



With that in mind, this came up recently. I was working with a client who reported a fairly interesting error. They had moved their servers all running as VMs (Virtual Machines) in XENCenter (Citrix's version of the open source XEN project included with SLES). They actually had moved the entire file system from one SAN to another, always a daunting task! Anyway, they rebooted all the VM's on all the hosts in the entire cluster, all at once. Can you spell disk hammering? Starting 70 or 80 VMs all at once, against a SAN probably blows away any caching benefits on the SAN, and really hits the disk hard!



Anyway the User Application, which happens was being used just for password self service sort of came up, but was giving 404 errors when you hit the web page.



Well, when troubleshooting the User Application, the first place to go look is the server.log file. This is usually (on Linux) in /opt/novell/idm/jboss/server/IDM/log/server.log (if you are using the Provisioning version of User App, then the IDM part of the path is probably replaced with IDMProv, since that is the name of the WAR being used, and is part of the URL).



I thought this was an interesting error for two reasons. One it seemed so odd, and two it is really clearly shown in the trace, that it would be nice to get a published article out there, so the next guy who runs into this and searches Google might find this slightly verbose explanation to make it easier to resolve.



The first error shows up below as "An unexpected exception occurred in the directory layer", which gives a direction to consider. This is probably a connectivity to eDirectory issue. You can see from the stack trace, what functions are running as this exception occurs. Things like:

com.sssw.fw.directory.realm.impl.jndildap.EboLdapDirectoryConnection.createBaseContext(EboLdapDirectoryConnection.java:308)
com.sssw.fw.directory.realm.impl.jndildap.EboLdapDirectoryConnection.authenticate(EboLdapDirectoryConnection.java:166)
com.sssw.fw.directory.realm.impl.jndildap.EboLdapDirectoryConnectionManager.createConnectionArray(EboLdapDirectoryConnectionManager.java:327)
com.sssw.fw.directory.realm.impl.jndildap.EboLdapDirectoryConnectionManager.<init>(EboLdapDirectoryConnectionManager.java:103)
com.sssw.fw.directory.realm.impl.jndildap.EboJndiLdapDirectoryFactory.createConnectionMgrInstance(EboJndiLdapDirectoryFactory.java:111)



Well sssw is the SilverStream something or other (I dunno what the last 'sw' mean). Then you can see they are jndildap functions, JNDI is the Java Naming and Directory Interface ( http://www.oracle.com/technetwork/java/index-jsp-137536.html ) which came from Sun, and is now from Oracle. It is a directory abstraction layer for writing Java code. This looks to be an LDAP instance of it from its name.



Ok so we probably have some kind of directory connectivity problem.



2010-09-18 15:20:15,219 FATAL [com.sssw.fw.directory.api.EboDirectoryFactory:<clinit>] An unexpected exception occurred in the directory layer.
com.sssw.fw.exception.EboUnrecoverableSystemException: An unexpected exception occurred in the directory layer.
at com.sssw.fw.directory.realm.impl.jndildap.EboLdapDirectoryConnection.createBaseContext(EboLdapDirectoryConnection.java:308)
at com.sssw.fw.directory.realm.impl.jndildap.EboLdapDirectoryConnection.authenticate(EboLdapDirectoryConnection.java:166)
at com.sssw.fw.directory.realm.impl.jndildap.EboLdapDirectoryConnectionManager.createConnectionArray(EboLdapDirectoryConnectionManager.java:327)
at com.sssw.fw.directory.realm.impl.jndildap.EboLdapDirectoryConnectionManager.<init>(EboLdapDirectoryConnectionManager.java:103)
at com.sssw.fw.directory.realm.impl.jndildap.EboJndiLdapDirectoryFactory.createConnectionMgrInstance(EboJndiLdapDirectoryFactory.java:111)
at com.sssw.fw.directory.api.EboDirectoryFactory$ConnMgrHolder.(EboDirectoryFactory.java:72)
at com.sssw.fw.directory.api.EboDirectoryFactory.getConnMgr(EboDirectoryFactory.java:103)
at com.sssw.fw.core.SystemConfig$1.run(SystemConfig.java:122)
at com.sssw.fw.core.SystemConfig$1.run(SystemConfig.java:120)
at java.security.AccessController.doPrivileged(Native Method)
at com.sssw.fw.core.SystemConfig.loadReadWriteSettings(SystemConfig.java:119)
at com.sssw.fw.core.SystemConfigBase.<init>(SystemConfigBase.java:210)
at com.sssw.fw.core.SystemConfig.<init>(SystemConfig.java:107)
at com.sssw.fw.core.SystemConfig.(SystemConfig.java:70)
at com.sssw.fw.servlet.InitListener.contextInitialized(InitListener.java:95)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:3854)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4359)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:761)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:741)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:553)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.tomcat.util.modeler.BaseModelMBean.invoke(BaseModelMBean.java:297)
at org.jboss.mx.server.RawDynamicInvoker.invoke(RawDynamicInvoker.java:164)
at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
at org.apache.catalina.core.StandardContext.init(StandardContext.java:5310)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.tomcat.util.modeler.BaseModelMBean.invoke(BaseModelMBean.java:297)
at org.jboss.mx.server.RawDynamicInvoker.invoke(RawDynamicInvoker.java:164)
at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
at org.jboss.web.tomcat.service.TomcatDeployer.performDeployInternal(TomcatDeployer.java:301)
at org.jboss.web.tomcat.service.TomcatDeployer.performDeploy(TomcatDeployer.java:104)
at org.jboss.web.AbstractWebDeployer.start(AbstractWebDeployer.java:375)
at org.jboss.web.WebModule.startModule(WebModule.java:83)
at org.jboss.web.WebModule.startService(WebModule.java:61)
at org.jboss.system.ServiceMBeanSupport.jbossInternalStart(ServiceMBeanSupport.java:289)
at org.jboss.system.ServiceMBeanSupport.jbossInternalLifecycle(ServiceMBeanSupport.java:245)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
at org.jboss.system.ServiceController$ServiceProxy.invoke(ServiceController.java:978)
at $Proxy0.start(Unknown Source)
at org.jboss.system.ServiceController.start(ServiceController.java:417)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
at org.jboss.mx.util.MBeanProxyExt.invoke(MBeanProxyExt.java:210)
at $Proxy45.start(Unknown Source)



Now a little bit later in the trace, we get a much better error, that I usually have only seen in the User Application trace. That is, the "Caused By" part. This turns out to be really nice! I am used to reading Java exceptions in Identity Manager, or in other products, and I usually do not see these Caused By type messages.



Anyone know why or how User Application does this, versus a standard Java application? They are really nice!



Anyway, if you look this one is the dead giveaway. No route to host. Doh! The eDirectory server was not up yet, when the User Application started up. More correctly, when the Jboss Web Application tried to load the IDM war file, and start running it, it could not get to the eDirectory server. Well some of our servers hit the fsck the disk, since they had not been rebooting in several hundred days problem, and that just hammered the SAN's disk even harder, slowing every thing else down. (It was not a really fun restart! Everything came back, but boy did it take a long time!)



Caused by: javax.naming.CommunicationException: 10.2.3.4:636 [Root exception is java.net.NoRouteToHostException: No route to host]
at com.sun.jndi.ldap.Connection.<init>(Connection.java:197)
at com.sun.jndi.ldap.LdapClient.<init>(LdapClient.java:118)
at com.sun.jndi.ldap.LdapClient.getInstance(LdapClient.java:1580)
at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2625)
at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:288)
at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:175)
at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:193)
at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:136)
at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:66)
at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:667)
at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:288)
at javax.naming.InitialContext.init(InitialContext.java:223)
at javax.naming.ldap.InitialLdapContext.<init>(InitialLdapContext.java:134)
at com.sssw.fw.directory.realm.impl.jndildap.EboLdapDirectoryConnection.createBaseContext(EboLdapDirectoryConnection.java:296)
... 149 more
Caused by: java.net.NoRouteToHostException: No route to host
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:550)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.<init>(SSLSocketImpl.java:353)
at com.sun.net.ssl.internal.ssl.SSLSocketFactoryImpl.createSocket(SSLSocketFactoryImpl.java:71)
at com.sssw.fw.directory.realm.impl.jndildap.LdapSocketFactory.createSocket(LdapSocketFactory.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.sun.jndi.ldap.Connection.createSocket(Connection.java:314)
at com.sun.jndi.ldap.Connection.<init>(Connection.java:184)
... 162 more




Now that we cannot get to eDirectory to read information, we get the next error, unable to retrieve global configuration settings. These are stored in the AppConfig container that is underneath the User Application driver.



It turns out the User Application drivers serves a number of purposes. One of which is to be a placeholder for the AppConfig container, which actually has quite a bit of 'stuff' in it. The objects that define the Teams, Roles, Resources, Provisioning Request Definitions, Directory Abstraction Layer, and other bits and pieces are stored there.



There are some other interesting things the User App driver does, but this is probably its most important aspect.



Well this error cascades out. Once you cannot load the configuration, things are going to heck in a hand basket for the JBoss server, and its IDM war. So we get a Error Loading framework base configuration into memory error. I.e. Cannot load this sucker without the configuration information to define some of its core components!




2010-09-18 15:20:15,237 ERROR [com.sssw.fw.servlet.Boot:contextInitialized] Unable to retrieve global configuration settings.
2010-09-18 15:20:15,237 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[localhost].[/IDM]:listenerStart] Exception sending context initialized event to listener instance of
class com.sssw.fw.servlet.InitListener
java.lang.RuntimeException: com.sssw.fw.exception.EboDataException: Error loading framework base configuration into memory.
at com.sssw.fw.servlet.InitListener.contextInitialized(InitListener.java:98)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:3854)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4359)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:761)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:741)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:553)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.tomcat.util.modeler.BaseModelMBean.invoke(BaseModelMBean.java:297)
at org.jboss.mx.server.RawDynamicInvoker.invoke(RawDynamicInvoker.java:164)
at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
at org.apache.catalina.core.StandardContext.init(StandardContext.java:5310)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.tomcat.util.modeler.BaseModelMBean.invoke(BaseModelMBean.java:297)
at org.jboss.mx.server.RawDynamicInvoker.invoke(RawDynamicInvoker.java:164)
at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
at org.jboss.web.tomcat.service.TomcatDeployer.performDeployInternal(TomcatDeployer.java:301)
at org.jboss.web.tomcat.service.TomcatDeployer.performDeploy(TomcatDeployer.java:104)
at org.jboss.web.AbstractWebDeployer.start(AbstractWebDeployer.java:375)
at org.jboss.web.WebModule.startModule(WebModule.java:83)
at org.jboss.web.WebModule.startService(WebModule.java:61)
at org.jboss.system.ServiceMBeanSupport.jbossInternalStart(ServiceMBeanSupport.java:289)
at org.jboss.system.ServiceMBeanSupport.jbossInternalLifecycle(ServiceMBeanSupport.java:245)



Then finally we cascade down to an overall failure of /IDM won't start, at all. This is basically to be expected due to the nature of the problem.



2010-09-18 15:20:15,240 ERROR [org.apache.catalina.core.StandardContext:start] Context [/IDM] startup failed due to previous errors
2010-09-18 15:20:21,156 ERROR [com.sssw.fw.cachemgr.core.EboCacheManager:<init>] Error loading framework base configuration into memory.
com.sssw.fw.exception.EboDataException: Error loading framework base configuration into memory.



Well this is where a User Application enhancement might be suggested. In cases like this, it would be nice if JBoss would try to deploy the WAR again in say 5 minutes. That is probably all the delay we needed in this case. Though with the way those disks were being hammered, maybe an hour would have been wiser.



Regardless, simply using the /opt/novell/idm/stop-jboss.sh script to stop JBoss, and then "nohup /opt/novell/idm/start-jboss.sh &" to restart it, and we were back up and running within 2 minutes.



It is interesting to watch the JBoss log as it starts up. The raw frameworks starts in maybe 10 seconds, but the rest of it, all the dependant pieces and libraries eat a fair bit of time, stretching out into the many minutes if the disk is busy!



One thing to watch out for in traces like this, is running out of memory. For a variety of reasons that I barely support, the shipping configuration in the start-jboss.sh script has minimal memory allocations set.



The defaults are 256 Megabytes of RAM in Java heap. This is not enough to start the Provisioning User Application anymore, as of Roles Based Provisioning Module version 3.7. That brings out an entire other error case, where the initial security roles are initialized (they only run once), but the events cannot be processed as there is not enough memory! Very annoying so make sure you edit your start-jboss.sh to give it at least one gigabyte of RAM or more if you can.



The line looks something like this:



JAVA_OPTS="-server -Xms256M -Xmx256M -XX:MaxPermSize=128m"



I would suggest setting it to something more like:



JAVA_OPTS="-server -Xms1024M -Xmx2048M -XX:MaxPermSize=512m"



This sets a minimum (-Xms) memory space of 1 GB, and max (=Xmx) of 2 GB of RAM. I do not entirely understand what MaxPermSize means but there is a known error about PermGen errors that needs more than 128 Megs of RAM. Of course this pretty much requires a 4 GB RAM server, but who does not have one of those these days just lying around? Nonetheless, you really need WAY more than the shipping 256 MB of RAM that is configured.



This might be a JBoss thing, since many of the patches will over write this file as well, but I still see this as a bad plan, and it actually needs to be fixed before you run the User Application WAR the first time, or else you risk loosing your initial provisioning of access rights within the User Application itself, which gets relatively painful to fix.



Now none of this is very profound, but it provides a nice example of some typical errors you might see in a User Application JBoss servers server.log file, and I think is worth getting out there. Also I am pretty good at remembering that I have solved something in the past, but pretty lousy at remembering at the exact details. This way if I write it down, and get it up on the Intertubes, then I stand a good chance of finding it again when I search for the error.



I have tried to do similar things, with error codes for other drivers and things. You can read about them in some of my other such articles like the following:



AD driver related:











eDirectory driver related:







JDBC Driver Related:









SAP HR driver related:






User Application driver related:




Solving a -614 Error in User App

And many other articles. You can find my entire collection of 200 articles at:
http://wiki.novell.com/index.php/Geoffrey_Carman's_personal_collection



Enjoy, and I hope this helps someone out there!


Labels:

How To-Best Practice
Comment List
  • Pieter,

    Great comment! Thanks that is very interesting, and useful information to have. Now to get you writing some articles! :)
  • Hi,

    First of all: I really like your articles!

    "Caused by" is basically a feature since Java 1.4 (see download.oracle.com/.../Throwable.html)
    In this case, the initialization of the Listener com.sssw.fw.servlet.InitListener fails, but JBoss catches the exception and wraps it into another exception. printStackTrace prints the whole chain to the log file.
    The same applies to servlets, filters, etc...
    If the same exception would be trown in a "normal" application, you would see the "caused by" part of the exception (except for the words "caused by") and the root of the problem should be clear.

    The Permanent Generation is memory allocated by the VM for the Java classes (not the objects) and methods. I think the more different classes (not objects) your app instantiates, the bigger the PermGen should be. I also think different ClassLoaders may have influence on the PermGen space. If one would like more information about the different types of "generations", just google for "tuning garbage collection" and click the first hit :-)

    Btw, tuning the garbage collection might also solve some memory issues, but beware: things may get worse :-)

    Cheers

    Pieter
Related
Recommended