Mobility Connector inconsistently stops overnight

DS Mobility 1.2.5 build 250
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2

Server has been running mobility fine since Mar 2010 and started on build 39. 10 users, 13 devices, 28% disk used.

No changes since build 250 was installed end of Feb.

For some inexplicable reason for the last month or so, when the connectors restart overnight to rotate logs or whatever housekeeping they do, the mobility connector doesn't restart correctly. Running "rcdatasync status" shows everything is fine and running but checking the web admin screen shows the mobili6ty connector status as "stopped". To get everything working again you just need to click the start button next to it and everything is fine again until the next overnight restart failure, which may be the next night or a few nights down the line. Nothing out of the ordinary in the logs that I can see, it must have done this 4 or 5 times now with nothing I can see triggering it.

Any ideas?

BR,
Mark.
  • MarkDissington,

    It appears that in the past few days you have not received a response to your
    posting. That concerns us, and has triggered this automated reply.

    Has your problem been resolved? If not, you might try one of the following options:

    - Visit http://support.novell.com and search the knowledgebase and/or check all
    the other self support options and support programs available.
    - You could also try posting your message again. Make sure it is posted in the
    correct newsgroup. (http://forums.novell.com)

    Be sure to read the forum FAQ about what to expect in the way of responses:
    http://forums.novell.com/faq.php

    If this is a reply to a duplicate posting, please ignore and accept our apologies
    and rest assured we will issue a stern reprimand to our posting bot.

    Good luck!

    Your Novell Product Support Forums Team
    http://forums.novell.com/

  • OK, no suggestions? Bit more info and 'cleaned' logs attached.

    Upgrade to 1.2.5 build 250 was done on 11th March 2013 looking at the mobility log format changing. Since this time the nightly "switch logger"/ connector restart process has failed on:
    2013-03-18 - manually restarted 9:44 AM
    2013-04-22 - manually restarted 12:51 PM (by server reboot)
    2013-04-25 - manually restarted 13:21 PM (by server reboot)
    2013-05-02 - manually restarted 08:36 AM
    2013-05-09 - manually restarted 09:14 AM
    2013-05-15 - manually restarted 09:33 AM
    2013-05-17 - manually restarted 08:39 AM
    2013-05-22 - manually restarted 08:36 AM

    The only other thing I can see logged around the same time is in /var/log/messages
    --
    May 22 00:30:01 gwms /usr/sbin/cron[15786]: (root) CMD (/usr/sbin/logwatch --service dmeventd)
    May 22 00:30:26 gwms su: (to postgres) root on none
    May 22 00:30:27 gwms su: (to postgres) root on none
    May 22 00:31:01 gwms /usr/sbin/cron[16212]: (root) CMD (/usr/sbin/logwatch --service dmeventd)
    May 22 00:32:01 gwms /usr/sbin/cron[16242]: (root) CMD (/usr/sbin/logwatch --service dmeventd)
    May 22 00:33:01 gwms /usr/sbin/cron[16268]: (root) CMD (/usr/sbin/logwatch --service dmeventd)
    May 22 00:34:01 gwms /usr/sbin/cron[16293]: (root) CMD (/usr/sbin/logwatch --service dmeventd)
    May 22 00:35:01 gwms /usr/sbin/cron[16319]: (root) CMD (/usr/sbin/logwatch --service dmeventd)
    May 22 00:36:01 gwms /usr/sbin/cron[16344]: (root) CMD (/usr/sbin/logwatch --service dmeventd)
    May 22 00:36:10 gwms su: (to postgres) root on none
    May 22 00:36:10 gwms su: (to postgres) root on none
    May 22 00:37:01 gwms /usr/sbin/cron[17097]: (root) CMD (/usr/sbin/logwatch --service dmeventd)
    --
    and this in /var/log/datasync/configengine/configengine.log
    --
    2013-05-22 00:30:21.675 INFO [MainThread] [__init__:55] [userID:] [eventID:] [objectID:] [] Got TERM signal
    2013-05-22 00:30:21.675 INFO [MainThread] [__init__:45] [userID:] [eventID:] [objectID:] [] ConfigEngine stopping
    2013-05-22 00:36:14.275 INFO [MainThread] [engine:125] [userID:] [eventID:] [objectID:] [] Starting DataSync ConfigEngine v1.2.5 250.
    2013-05-22 00:36:14.275 INFO [MainThread] [configEngine:21] [userID:] [eventID:] [objectID:] [] Creating CE...
    2013-05-22 00:36:14.305 INFO [MainThread] [configEngine:44] [userID:] [eventID:] [objectID:] [] Initing with storageEngine...
    2013-05-22 00:36:14.319 INFO [MainThread] [storage:59] [userID:] [eventID:] [objectID:] [] Loading all engines...
    2013-05-22 00:36:14.373 INFO [MainThread] [storage:76] [userID:] [eventID:] [objectID:] [] Loading engine: /etc/datasync/configengine/engines/default/engine.xml
    2013-05-22 00:36:14.400 INFO [TargetsCleanup_Thread] [syncengine:202] [userID:] [eventID:] [objectID:] [] Targets cleanup started ...
    2013-05-22 00:36:14.413 INFO [MainThread] [storage:89] [userID:] [eventID:] [objectID:] [] Loading pipeline: default.pipeline1
    2013-05-22 00:36:14.423 INFO [MainThread] [storage:103] [userID:] [eventID:] [objectID:] [] Loading connector: default.pipeline1.groupwise
    2013-05-22 00:36:14.449 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.groupwise.source.schemamap
    2013-05-22 00:36:14.464 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.groupwise.source.eventFilter
    2013-05-22 00:36:14.464 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.groupwise.source.usermapfilter
    2013-05-22 00:36:14.465 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.groupwise.source.folderTracker
    2013-05-22 00:36:14.465 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.groupwise.source.transformationFilter
    2013-05-22 00:36:14.465 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.groupwise.sink.usermapfilter
    2013-05-22 00:36:14.469 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.groupwise.sink.folderTracker
    2013-05-22 00:36:14.469 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.groupwise.sink.schemamap
    2013-05-22 00:36:14.469 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.groupwise.sink.transformationFilter
    2013-05-22 00:36:14.470 INFO [MainThread] [storage:103] [userID:] [eventID:] [objectID:] [] Loading connector: default.pipeline1.mobility
    2013-05-22 00:36:14.490 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.mobility.source.usermapfilter
    2013-05-22 00:36:14.507 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.mobility.source.folderTracker
    2013-05-22 00:36:14.507 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.mobility.sink.usermapfilter
    2013-05-22 00:36:14.510 INFO [MainThread] [storage:172] [userID:] [eventID:] [objectID:] [] Loading filter settings: default.pipeline1.mobility.sink.folderTracker
    2013-05-22 00:36:14.512 INFO [MainThread] [directory:129] [userID:] [eventID:] [objectID:] [] Polling LDAP groups every 1800 seconds.
    2013-05-22 00:36:14.514 INFO [MainThread] [odbc:180] [userID:] [eventID:] [objectID:] [] Connecting to database at UID=datasync_user;DATABASE=datasync;DRIVER={DataSync_postgresql};SERVER=localhost;PWD=*****;PORT=5432;READONLY=no
    2013-05-22 00:36:14.909 INFO [MainThread] [odbc:238] [userID:] [eventID:] [objectID:] [] System ODBC version 03.52, using driver: psqlodbcw.so version 08.03.0200 (supports ODBC 03.51)
    2013-05-22 00:36:14.910 INFO [MainThread] [__init__:93] [userID:] [eventID:] [objectID:] [] Connected to database...
    2013-05-22 00:36:14.910 INFO [MainThread] [__init__:99] [userID:] [eventID:] [objectID:] [] Init complete
    2013-05-22 00:36:14.910 INFO [MainThread] [configEngine:46] [userID:] [eventID:] [objectID:] [] Starting ConfigEngine...
    2013-05-22 00:36:14.910 INFO [MainThread] [__init__:36] [userID:] [eventID:] [objectID:] [] ConfigEngine starting
    2013-05-22 00:36:15.597 WARNING [LDAPPoll_thread] [directory:163] [userID:] [eventID:] [objectID:] [] Engine default not available.
    2013-05-22 00:36:24.363 INFO [CP WSGIServer Thread-9] [__init__:1619] [userID:] [eventID:] [objectID:] [] Engine default registered with configengine.
    2013-05-22 00:36:45.780 INFO [TargetsCleanup_Thread] [syncengine:238] [userID:] [eventID:] [objectID:] [] Targets cleanup finished ...
    2013-05-22 08:33:45.980 INFO [CP WSGIServer Thread-11] [__init__:1489] [userID:] [eventID:] [objectID:] [] LDAP enabled. Using LDAP for authentication.
    --
    So my number one suspect is this line - "2013-05-22 00:36:15.597 WARNING [LDAPPoll_thread] [directory:163] [userID:] [eventID:] [objectID:] [] Engine default not available." But that error message does appear in the log pretty consistently even without any noticeable affects.

    Apart from the 'stopped' connector instances listed above server runs perfectly fine AFAICT.

    Surely someone else has experienced this? Can't believe that this is unique to us, something in build 250 must have caused the logger switch to fail under certain circumstances - I suspect it's not waiting for the postgres DB to be ready for some reason any more.

    TIA,
    Mark.
  • Hi Mark, try updating to 299 first and retry. I have not seen the error you mention and it indeed seems to be failing because of that. is that a plain mobility pack server ?
    Otherwise you might need to open a service request.
  • Hi Kai,

    OK, I've fully patched the server this morning (both OS and Mobility) - will feedback what we find. Upgrade process was smooth so there shouldn't be any issues from that at least :-)

    Mark.
  • Thought I'd temp fate and feedback that the upgrade to DS 1.2.5 b 299 seems to have resolved the connectors not restarting correctly when the nightly logs rotate. 30 days since the upgrade and haven't had to manually restart the connectors once. We'll leave the cause as one of life's mysteries.

    Mark.