Our vBulletin migration is complete.
Welcome vBulletin users! All content and user information from the Micro Focus Forums (vBulletin) site has been migrated to this site. READ MORE.
Trusted Contributor.. bwcole Trusted Contributor..
Trusted Contributor..
1170 views

SiteScope 11.33 - Skip monitors - research and corrections

The story of Skips

I'm not going into what a skip is, I'm assuming you have already read all the documentation and the other posts on skips.

Over the years, we have had monitor skips and adjusted the Log Event Health Check alert so we get alerted when there are 2 skip 3 or 1 skip 4 with the SiteScope restart at the default skip 10.  Typically this has given us enough time to log into SiteScope, locate the monitor that is skipping, disable, then enable to stop the skipping.

We started to get an increase frequency of monitors that were making it to skip 4, so we investigated and determined that due to our increase of monitors, about 100% increase, and about 66% of our monitors were configured to 8 minute frequency, we thought that the number of thread pools and the number of threads within those pools might need to reviewed.  We increased both to 400 from the default of 200.  That appeared to help, but we were still getting called in the middle of the night with Skip 4.

On investigation, we had a number of monitors that were included in multiple composite monitors that each had "run monitors" which caused the included monitor to run multiple times.  The monitor did run within sub-second but when the monitor was ran three or more times within a minute, it would sometimes skip.

So we corrected our composite monitors so monitors were only being ran once.  This helped quite a bit in removing all of the skip #1 from our skip log, but we were still getting skip #4 and getting an early morning wake up call.

We investigated the monitors reaching skip #4 and were seeing that the monitor was running much more frequently than it's configured 8 minute cycle.  The majority of the skips were fairly heavy monitors, Unix Resource Monitors with one counter, and the Database Query Monitors, some of which targeted the same server querying for multiple business counters.

We moved our Unix Resource Monitors that skipped in the last month, to 11 and 13 minutes (prime numbers to try to limit multi-frequency cross over) and that really helped.

We were still seeing Database query monitors running at a much higher frequency than what they were configured.

We recreated the database query monitors, and had the original and newly created monitors run at the same time.  The only difference between the two was the original was used as a depend on on a group defination and the new monitor wasn't.  The new monitor ran at configured frequency, while the original was running at 1, 3, 5 and 8 minute cycles.  So it appeared that the dependency was driving the original monitor to run more frequently. We opened a support case (SD02367590), to correct this behavior since being included as a monitor or group dependency should not drive the monitor to run.

We have also found that the DNS Monitor has the same behavior that if the DNS monitor is used as a dependency, the DNS monitor will run more frequently than what it is configured to run.

 

From my experince, there are typically three in good state for a monitor to be ran, by it's own runtime setting frequency, being included in a composite with run monitors set, and by being part of a enterprise business transaction.  Typically if a monitor is included in a composite or a EBT, the monitor's runtime frequency is set to 0.

For error state, the monitor's error frequency, if the composite has a error frequency, EBT has retry or if any of the three has "verify error" would cause additional runs of a monitor.

If anyone knows of any other configuration that would cause a monitor to be scheduled to run, please reply to this discussion with the details.

 

Thank you,

 

Billy

 

 

 

 

Labels (4)
18 Replies
Honored Contributor.. dasomm Honored Contributor..
Honored Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

Hi,

You mention Error Freq., "Verify Error" is also something that can cause load/delays, and should only be used for testing/limited use.

Regarding dependency; monitors that have dependant monitors will be run when one or more dependant monitors are run and have status !=Good. 

Make sure you have the right connection settings for Remote Servers, for instance SSH version for Linux servers. I would also avoid WMI if possible and use NetBios instead for Win servers.

0 Likes
Trusted Contributor.. bwcole Trusted Contributor..
Trusted Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

Thank you

On the dpendency, does or should a dependency monitor ran for status, or only the most recent state be checked?

The monitors that we have seen this behavior, of being ran at a higher frequency than what the runtime frequency is set to are: DatabaseQueryMonitor, Unix Resource Monitor and DNS Monitor.

On the DatabaseQueryMonitor, we removed the dependency on it, and it has been running at the runtime frequency ever since.

 

0 Likes
Honored Contributor.. dasomm Honored Contributor..
Honored Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

If a monitor that has a dependency to another monitor gets status Error, the monitor on which it depends has to be checked by SiteScope. If that monitor status is Good it has to be run to get the current status, since this may have changed since the last time it was run. Therefore these monitors may run more often than defined in run settings.
This is my understanding on how dependency works.

0 Likes
Trusted Contributor.. bwcole Trusted Contributor..
Trusted Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

Thank you Dasomm,

If it is the case, that having a dependency on a monitor causes the dependent target to run each time the monitor runs, then I would take that as a behavior that negates the use of dependences.

From the online help:

"Use this option to prevent redundant alerting from multiple monitors that are monitoring different aspects of a single system."

"Select Good and this monitor is enabled only when the monitor selected in the Depends on box reports a status of Good."

So let say that you are monitoring a target system with a ping monitor, to insure that there isn't an issue with network/communication, then a port ping to the application which has the ping as a dependency.  No sense checking the application port if the server is DOA.  Then you have two dozen application URL/URL Sequence/Web Service calls, which are all dependent on the applicaton port ping monitor to return good.

Let say you have the ping running every 3 minutes, the port monitor every 5 minutes and then the two dozen application monitors run every 10 minutes.  Would the ping run three times, or would it run 29 times within ten minutes?

Now lets add more weight to the mix, the group that contains the 24 application monitors are dependent on the applicaition database.  The database monitor is configured to run every 8 minutes.  Since there are 24 monitors in the group, all of the application monitors set to 10 minutes, would that cause the database monitor to run 24 times within 10 minutes?

The documentation is very unclear in this matter of if a dependency would check the current reported state, since the monitor does store current state since it is displayed between run cycles on the GUI or if the depend on causes the dependency target to run in order to get current state.  The latter would make dependencies far more expensive to use and would, I think, negate the usefulness of having dependencies.

Case in point, you have a DNS monitor to insure that at least one of your DNS servers are responding, and then you have 300 ping monitors.  The DNS monitor set to check the DNS server cluster every 5 minutes and the pings to monitor every 3.  With this case, we would see the DNS monitor run 301 to 601 times every 5 minutes.  That is if the pings run at minute 1 on the DNS monitor, and again at minute 4 on the DNS monitor.  If this is the case, then the draw-back on having a dependencies defined in a large (greater than 2000) monitor environment would be very resource intensive and would cause skips at an paralizing rate.

I'm going to pose the question on my support case on this matter and see what comes back.

 

Thank you again,

 

Billy

 

0 Likes
Honored Contributor.. dasomm Honored Contributor..
Honored Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

"If it is the case, that having a dependency on a monitor causes the dependent target to run each time the monitor runs, then I would take that as a behavior that negates the use of dependences."

No, that is not what I meant. 
There is no need to run the Ping monitor as long as the dependant monitors are run with status ´Good´, is there?

But when one of the dependant monitors goes to ´Error´, then SiteScope has to verify that the Ping monitor is indeed in ´Good´status. To do that the Ping monitor has to be run, regardless of it´s schedule. When the Ping monitor is run and set to ´Error´, all the dependant monitor will be disabled and not run until Ping monitor is ´Good´.

How else can SiteScope ensures that dependency check is done before status is set?

Just to clarify since english is not my native language:
By "dependant monitor" I mean monitors that are set up with dependency towards anoher monitor, opposed to monitors that does not have dependency towards any other monitor.
The Ping monitor in your example is a typical example of a monitor that is not dependant, but has other monitors dependant on it.

🙂

0 Likes
Trusted Contributor.. bwcole Trusted Contributor..
Trusted Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

Thank you Dasomm,

We will get through this, and thank you for engaging.

With the following monitor/group structure:

Monitor name       Monitor type      freq(min)  Contains        Depends on

DNS-1                   DNS Monitor          0

DNS-2                   DNS Monitor          0

DNS-composite      Composite             4     DNS-1, DNS-2

Ping-1                     Ping                      3                                                               

Ping-2                     Ping                      3                                                               

Ping-3                     Ping                      3                                                               

Ping- X                     Ping                     3                                                               

Ping-200                   Ping                    3                                                               

Ping-Group               Group                 Ping-1 . . . 200               DNS-composite

 

How frequent should the DNS-1 and DNS-2 run when everything is in a good state?

I would think that the DNS-1 and DNS-2 should run every 4 minutes, but we are experincing the DNS-1 and DNS-2 monitors run at different times that are inconsistent such as twice a minute, every 3, 4, 5, minutes and then combinations of each of those, which is causing skips on the DNS monitors when the time cycles occur too close together.

And the question is what is the expected behavior in the example above?

I'm I dealing with known behavior or have I happened upon a defect that is causing a number of monitors in SiteScope to skip?

Thank you,

Billy

0 Likes
Honored Contributor.. dasomm Honored Contributor..
Honored Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

Regarding if dependency could be a cause your issue with monitors running more often than run schedule; are there monitors without dependency that run outside of their run settings? If so, this cannot be due to dependency.

Regarding your example; 
Are Ping monitors really dependant on DNS monitors? Is it not the other way around? Do you have Ping monitor in a group and dependency from each DNS monitor set to that group?Maybe some screenshots would make it easier to understand.

0 Likes
Trusted Contributor.. bwcole Trusted Contributor..
Trusted Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

Thank you Dasomm,

The base monitor that is being used as a "depends on" on a group is the base monitor that is running more often than the runtime frequency on the monitor.

When we remove the depends on from the group, the base monitor runs at the set frequency.

Because how vauge the documentation in reguards to the how the depends on requests the "current state" of a monitor, I wanted to ask the community.

 

The ping monitors are a mix of IP and host name, mostly hostname.  If DNS is having issues, causing all of the pings with host name to fail since they are unable to retreive a IP address, we don't want hundreds of pings to fail when it is only a DNS server issue.

 

 

 

0 Likes
Honored Contributor.. dasomm Honored Contributor..
Honored Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

OK, so if I understand you correctly:

- you see run frequency higher than run time settings only for monitors with dependant monitors?
- you have Ping Monitors dependant on DNS Monitors to disable Ping Monitors when DNS Monitors fail?

If this boils down to verify that monitors that have dependant monitors can have a higher frequency than set in run settings, you can easily test this by creating two new monitors on a test server; Ping monitor with no dependency, and a service monitor with dependency towards the ping montor. Set run time to every 1 hour and run both monitors. Then shutdown the server, and try to run the service monitor again. Then you will see that the ping monitor is run and the service monitor is disabled.

But this will have an impact only when the dependant monitors often fail. As long as they are Good, there shouldn´t be any impact on the other monitor. Are there many failed Ping Monitors in your environment?
What about your composite monitors, how many indivual monitors do they contain? Could it be that too many monitors are triggered at the same time, causing time out?

0 Likes
Trusted Contributor.. bwcole Trusted Contributor..
Trusted Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

- We see run frequencies higher than runtime settings for monitors that are being used as a dependency monitor

DNS Monitor composite with a runtime frequency set to 4 minutes

Ping group, has the DNS Monitor as a dependency and has a few hundred ping monitors, some are set to 1 minute, or 3 minute runtimes.  We created a second DNS monitor composite, with the only difference is that the orginal was used for the ping group dependency.  The second DNS monitor composite ran every 4 minutes.  We then switched the orginal for the second DNS monitor on the ping group, the original began running every four minutes, and the second started to run more frequently than the 4 minutes, at random cycles.

The DNS monitor runs, sometimes multiple times a minute, then seemingly randomly.

Rarely do we have any ping failures but when we do, the ping monitor are our first line of alerting.

 

0 Likes
Trusted Contributor.. bwcole Trusted Contributor..
Trusted Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

We have created new servers, installed SiteScope 11.51 on the servers, exported our monitors from 11.33 and imported into the new 11.51 servers.

We are still seeing the same behavior that our composite for our internal DNS, containing two DNS monitors, with a runtime frequency set to 4 minutes running more often.

 

monitorhistory.jpg

0 Likes
Honored Contributor.. dasomm Honored Contributor..
Honored Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

"Ping group, has the DNS Monitor as a dependency and has a few hundred ping monitors, some are set to 1 minute, or 3 minute runtimes.  We created a second DNS monitor composite, with the only difference is that the orginal was used for the ping group dependency.  The second DNS monitor composite ran every 4 minutes.  We then switched the orginal for the second DNS monitor on the ping group, the original began running every four minutes, and the second started to run more frequently than the 4 minutes, at random cycles."

OK, so this shows that somehow dependency causes the increased frequency.
What are the thresholds for the Ping Monitors? Dependency Depends on Condition is set to ´GOOD´on Ping Monitors?
The run settings for the indivudual DNS monitors is set to ´0´? And the Comp monitor is set to Run monitors?
What does the monitor history look like for the Ping monitors when you see increased freq for the Comp monitor? No Errors/Warnings?

I did a quick test (on SiS 11.51) with a Comp monitor containing two monitors (Run freq 2 minutes on Comp, 0 minutes on individual monitors), and created a Ping monitor (Run freq 3 minutes) with dependency (Condidition=Good) towards the Comp monitor.  
Monitor history showed that the run freq was aligned with the Run settings.
Then I manipulated the Ping monitor to fail, and saw that this triggered the Comp Monitor to run (outside it´s freq).
This is of course a tiny scale test in comparison to your environment (you have hundreds of Ping monitors with dependency towards a single Comp monitor?), but at least on this scale it works as it should. 

0 Likes
Trusted Contributor.. bwcole Trusted Contributor..
Trusted Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

         Dasomm,

Thank you for your response.

The ping thresholds are the defaults

error             % packets good == 0

error              round trip time == 'n/a'

good              % packets good == 100

The ping group, that has the dependency on the composite is set to "good"

 

Support has requested export templates but have no clue how to do that, so I sent them a full monitor report on all the elements within this monitor chain.

 

And the search continues,

 

Billy

 

 
0 Likes
Honored Contributor.. dasomm Honored Contributor..
Honored Contributor..

Re: SiteScope 11.33 - Skip monitors - research and corrections

The mystery thickens 🙂

Have you considered moving dependency settings from group level to monitor level? I.e. removing dependency from Ping group and set it on each Ping monitor instead (using template of course)? 

 

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.