Configuring delayed actions in Service Level Agreements

I'm trying to put a timer on an element so that an email is sent after
the element is Critical for 5 minutes. The email should be sent anytime
the element is critical for more than 5 minutes, but only once for each
individual outage. So, if it is Critical for a continuous 3 hours, then
1 email is sent after the first 5 minutes. If it is down 5 times in an
hour for 5 minutes each, then 5 emails are sent.

In a NOC admin class, I was told this should be done in SLA using an
Objective Type of Outage. I can cause this type of behavior to happen
only once, then it seems to stop working.


Here are the steps performed:

1. Element is set to a condition of Unknown
2. Element is forced into a condition of Critical
3. Wait 5 minutes
4. Email is sent. Log shows script output.
5. Set element back to Unknown.
6. Set element back to Critical.
7. Wait 5 more minutes.
8. Email is not sent. Log does not show script output.

I have tried setting the max number of outages to 0 per the docs so that
every outage is allowed, but that doesn't work at all; even on the first

How should I configure this to get the intended outcome?

|Filename: Outage SLA.PNG |
|Download: |

csssl's Profile:
View this thread:


  • Verified Answer

    I'm not saying that the SLA engine is not the right way to go, I'm just
    sure of your answer and if you continue that option (which may or may
    not be the best solution... again... I don't know), you would need to
    contact support.

    Ignoring that, I think you have a couple script options.

    1) When a new alarm comes in that meets your criteria, you stuff it into
    a global state variable hashmap, treemap, etc. You have another script
    running (ie; kicked on one of your adapters via script on started) that
    is looking at the contents of that global hashmap, treemap, etc. It
    pulls an item from the map (such as an alarm id, or an entire alarm).
    It checks the current time vs the alarm time and either does an
    additional action or goes to sleep for another 30 seconds, a minute...
    whatever. Then it wakes and checks again. You may have to have a temp
    hashmap for when you pull items off, you put it into the temp-map. For
    items not timed out (per your logic) you put them on the temp-map. Just
    before going to sleep, maybe you traverse the temp-map and put them back
    into the main map. (general idea, probably a few ways to do this).

    2) Another option is that when the an alarm comes in that meets your
    criteria, it creates a brand new thread (ie: The thread contains the alarm you are
    looking at. The thread then goes to sleep for the 5 minute period.
    When it wakes up, it looks at the alarm it was given, it then tries to
    retrieve the alarm from NOC, if it doesn't exist, end the thread and
    your done. If it does exist and hasn't been worked on, then send out
    another notification or whatever you need/want to do.

    This is just some quick random ideas. Of course any/all of this needs
    to be carefully planned out so it doesn't overwhelm the system (IE: 10K
    threads running/sleeping), or 10K things alarms in a tree map being
    looked at. I also don't have a full understanding of you
    implementation or the other requirements.

    NetIQ Consulting may be a good option to help design a solution for this
    also. Anyways, I hope this helps, please update us with your project.


    tisenberg's Profile:
    View this thread:

  • I have opened a support ticket on this. Thanks for your input.

    csssl's Profile:
    View this thread: