SiteScope 11.24 integrated with BSM 9.25, and Events from dependant monitors
We have a SiteScope server running ping monitors against routers (~500) and switches (~3500) on ~500 sites spread around the country. All monitors are set up to send an event to BSM when a router or switch is down. We obviously don't want events saying that switches are down when they cannot be reached because the router on the same site is down, so we use dependancy setting for all switch monitors. The router monitors are principal monitors for "it's" switches and the switches are dependant monitors.
This works fine, except from some very odd instances.
What happens then is this (starting with a site where router and switches are all in GOOD status):
1. Router goes down -> from GOOD to ERROR event to BSM..
2. All switch monitors are disabled ("dependant upon") -> NO events to BSM.
3. Router comes back up - from ERROR to GOOD event in BSM.
--- So far everything is as expected----
4. Switch monitors are enabled and run -> from ERROR to GOOD events in BSM from all switches that are GOOD.
This is not normal behaviour since the switch monitors was never in ERROR. They went GOOD-DISABLED-GOOD.
In SiteScope everything is showing correctly, it's the events sent to BSM that are incorrect.
The main issue with this behaviour is that we miss events from switches that actually goes down while they are disabled. If one or more switches goes down while being disabled, SiteScope send no events from these. So GOOD-DISABLED-ERROR results in no events sent to BSM. It seems that SiteScope defines a status change on dependant monitors that were GOOD before disabling and after enabling, but NOT on dependant monitors that went from GOOD before disabling and to ERROR after enabling.
We have been running SiteScope for 4 years now, have about 18.000 monitors in total and use dependancy for many monitors. 99,9 % of the time everything works fine, but during the last year or so I've seen about 5-6 instances of this strange behaviour. Since we have a NOC that monitors for BSM Events, it's not good when switches are down without events.
As I understand monitor dependancy a dependant monitor is not run at all when it's disabled due to principal monitor being in ERROR (or whatever status condition that is defined for the dependancy). Therefore the dependant monitor should not change status while being disabled.
I have a support case on this but wonder if anyone else has experienced this and/or have some comments?
Per the description of the issue....it sounds like a SiteScope defect. I suggest to continue working the case you already opened. Also, is this behaviour happening only in 11.24 version or any other particular version?, did you try to reproduce the behaviour in a more recent SiteScope versions (i.e. 11.30)?
Customer Support Engineer
If you find that this or any other post resolves your issue, please be sure to mark it as an accepted solution.
If you are satisfied with anyone’s response please remember to give them a KUDOS by clicking on the STAR at the bottom left of the post and show your appreciation.
Thanks for feedback,
I agree that this sounds like a defect. I haven't been able to reproduce the error myself, but I got feedback from a HPE guy that he had reproduced it on 11.33.
He tested with all monitors in the same folder, and ran them all at once. His theory is that SiteScope somehow starts the dependant monitors right before the principal monitor has finished it's run, and therefore sets the monitors to ERROR "internally" (without this being logged or status in GUI changed). And that there is a second check of status after the principal is run which disables the dependant monitors, and therefore the SiteScope status is correct. But when SiteScope reports status to BSM, it's the first check that defines status, i.e. status was GOOD-ERROR-DISABLED-GOOD.
I think this sounds plausible, it would explain the events. Still a defect of course.
What doesn't match the scenario above in my case however, is that routers and switches are in different folders and have different monitor run settings. Routers have 5 min interval, switches 12 min.
An update to this one; did some testing with SiteScope 11.41 with the same result as in 11.24.
From testing since this error arose, I think dependency works like this:
1. Before dependant monitor (DM) is run, the status of the principal monitor (PM) is checked.
2. If status on PM does not meet Depends on condidition, the DM is not run, but set to "Disabled, depended upon...".
3. If status on PM meets the Depends on condidtion, the DM is run.
4. If status on DM after run is GOOD, status on DM is set to GOOD.
5. If status on DM after run is not GOOD, the PM is run to check it´s status.
6. If PM is GOOD after run, DM´s status is set according to it´s last run.
7. If PM is not GOOD after run. PM´s status is changed accordingly and DM is set to "Disabled, depended upon..."
This all works fine in SiteScope dashboard, the statuses reflect the steps above. But for events to OBM it doesn´t work as expected. SiteScope will send a GOOD event also for the DM the next time it is run and status is GOOD. But if the status is not GOOD there will be no event.
So "internally" in SiteScope DM seems to be GOOD-ERROR-DISABLED-GOOD, instead of GOOD-DISABLED-GOOD.. This only happens when PM´s status is GOOD prior to DM running, and is set to ERROR in step 7 above.
Appreciate input from other experiences on this!