Server constantly stops working!

Hi,
I am really in the dark on this one, due mainly to the fact that this server was a "fit and forget" installation once upon a time. We are GW8 on NetWare 6.5 (still!) and only recently did we move from the mobility pack to GMS. At that time, I patched SLES 11 SP1 to SP3, and upgraded to GMS 2.0.1. It did run fine for a week or two, but now, every 48 hours or less, I have to reboot the SLES server. At some point, with no pattern, it stops working. The devices stop receiving mails, and when checking new mails it just comes back as if you haven't had any (no errors). The web portal for GMS shows the services as running, as if all is well.
So I keep rebooting the **** thing every day or two, but where do I start with looking into the reasons?
I've upgraded our GW8 to 8.0.3 HP4 this weekend, to see if it helped - it didn't.

TIA for any suggestions!
Alan
  • BTW, I hadn't even noticed the dashboard wasn't working, not sure when that even stopped working. I guess during the upgrade but I'm sure I've used it since!
    Anyway, I've fixed that now following the article I just found.
    There is 1 caution on the Users - There is a user at the top of my list with "-" and status "disabled". This user doesn't show in the users list though. Is this a rogue user or something? Can I clean it up in the backdoor?
  • briggsb;2341573 wrote:
    Hi,
    I am really in the dark on this one, due mainly to the fact that this server was a "fit and forget" installation once upon a time. We are GW8 on NetWare 6.5 (still!) and only recently did we move from the mobility pack to GMS. At that time, I patched SLES 11 SP1 to SP3, and upgraded to GMS 2.0.1. It did run fine for a week or two, but now, every 48 hours or less, I have to reboot the SLES server. At some point, with no pattern, it stops working. The devices stop receiving mails, and when checking new mails it just comes back as if you haven't had any (no errors). The web portal for GMS shows the services as running, as if all is well.
    So I keep rebooting the **** thing every day or two, but where do I start with looking into the reasons?
    I've upgraded our GW8 to 8.0.3 HP4 this weekend, to see if it helped - it didn't.

    TIA for any suggestions!
    Alan


    Hi Alan,

    As a first, have you checked that all mount points on the GMS server have sufficient free space left?

    Also, are you familiar with dsapp.sh?
    That is a good tool to do use to get a first impression of GMS health. If you haven't, see : https://www.novell.com/communities/coolsolutions/cool_tools/dsapp/

    The dsapp tool can also run maintenance on the GMS database (vacuum and index). If that has not been done recently, the dsapp tool's health check option will report that.

    In any case let us know what the health check option is reporting on the GMS server.

    Cheers,
    Willem
  • Hi Willem. Many thanks for that, I'd never come across it, wish I had!
    It's straight away helped sort a few things, but it found an issue, as suspected, with a user than no longer exists. How do I remove all trace of that user, if they don't show up on the GUI (web/admin portal)?
    There's also a couple of user FDN errors "user xxx has possible incorrect FDN". I'm not sure how to solve those either?
    Thanks again though, no doubt this will help me out loads in future!

    EDIT - I think I found what I was after, thanks again - "Select User Issues | Remove
  • briggsb;2341604 wrote:
    Hi Willem. Many thanks for that, I'd never come across it, wish I had!
    It's straight away helped sort a few things, but it found an issue, as suspected, with a user than no longer exists. How do I remove all trace of that user, if they don't show up on the GUI (web/admin portal)?
    There's also a couple of user FDN errors "user xxx has possible incorrect FDN". I'm not sure how to solve those either?


    It's a little gem indeed, this dsapp tool.

    Those user errors can usually be fixed by using the "Fix targets/membershipCache" option in dsapp. The tool automates the tasks that are described here : https://www.novell.com/support/kb/doc.php?id=7012163


    There is a good wiki on the tool (by the maker(s) of the tool) that you can find here: https://github.com/tdharris/dsapp/wiki


    If you have not run the vacuum and re-index before, or very long ago.... do note that it can take a long time (up to an hour) for the procedures to run through. During the maintenance the GMS services need to (and will be) shut down.


    While possible, I haven't seen the errors you mention cause the GMS service to stop. Where I've seen that it's usually do to not having enough disk space free.
    When GMS falls to a stop... have you checked the logs in /var/log/novell/datasync/* to see if anything specific is mentioned?


    Cheers,
    Willem
  • briggsb;2341604 wrote:
    Hi Willem. Many thanks for that, I'd never come across it, wish I had!
    It's straight away helped sort a few things, but it found an issue, as suspected, with a user than no longer exists. How do I remove all trace of that user, if they don't show up on the GUI (web/admin portal)?
    There's also a couple of user FDN errors "user xxx has possible incorrect FDN". I'm not sure how to solve those either?


    It's a little gem indeed, this dsapp tool.

    Those user errors can usually be fixed by using the "Fix targets/membershipCache" option in dsapp. The tool automates the tasks that are described here : https://www.novell.com/support/kb/doc.php?id=7012163


    There is a good wiki on the tool (by the maker(s) of the tool) that you can find here: https://github.com/tdharris/dsapp/wiki


    If you have not run the vacuum and re-index before, or very long ago.... do note that it can take a long time (up to an hour) for the procedures to run through. During the maintenance the GMS services need to (and will be) shut down.


    While possible, I haven't seen the errors you mention cause the GMS service to stop. Where I've seen that it's usually do to not having enough disk space free.
    When GMS falls to a stop... have you checked the logs in /var/log/novell/datasync/* to see if anything specific is mentioned?


    Cheers,
    Willem
  • Morning. I'm on holiday this morning, but noticed my phone has had nothing since 10pm last night. I cant send an email from my iphone either. (cannot send mail an error occurred while delivering this message)
    I've come online briefly to look at it before rebooting the server. Disk space - I assume "df" command shows me what I need to see? If so, lots of free disk space available. I'm now collating the log files to check when I get in later, any particular log I should attack first? Lots in the various subfolders!
    I can login to the web interface OK, services both running, but the dashboard is not happy at all...

    Warning12/09 01:04Events From Engine Per Minute: There is a large number of events coming from the sync engineDevice Sync Agent2Clear
    Warning12/09 06:51Average Device Request Time: Device requests are taking a long time to completeDevice Sync Agent33Clear
    Caution12/08 20:50Attachment KBs Received from GroupWise: There is a large amount of attachment data being retrieved from GroupWiseGroupWise Sync Agent11Clear
    Caution12/09 01:06Percentage of Events that are Slow (Full Time): Some events are taking an excessively long time to be processed

    Huge thanks for your help with this!
  • my colleague had a mail at 6am this morning though. He cannot send mails either now though. My GroupWise account had lots of mails over night, but the last one on my phone at 10pm ish was the last of the emails for 08/12/14.
  • my colleague had a mail at 6am this morning though. He cannot send mails either now though. My GroupWise account had lots of mails over night, but the last one on my phone at 10pm ish was the last of the emails for 08/12/14.
  • If it just dies and there is disk available then the best recourse is
    to open an SR with Novell.

    Some of the errors you see in the dashboard may be cosmetic, depending
    on your timezone? What Timezone are you in?

    --
    Anders Gustafsson (NKP)
    The Aaland Islands (N60 E20)

    Have an idea for a product enhancement? Please visit:
    http://www.novell.com/rms

  • Hi, UK timezone, I will check the timesync is working to some extent, but I'm not sure I can log an SR with Novell? We are pay our maintenance, but I don't recall this allowing us to log an SR?