basic hints about how to deal with a message storm

Hi,

my question is a bit generic =S
I would like to know some things to improve my understanding about message storms in OVO.
1) How to identify that a message storm is happening.
2) What is the message flow in a message storm?
3) Why can be convenient to shut down the agent in case of message storm?
4) Where can I look at to identify the root cause of a message storm?

Thanks in advance! Hope that I will have a better understanding with this answers.
Parents
  • Verified Answer

    Hi to all,

    The whitepaper and everything are nice, but there is one major problem. When you shut down the agent on the managed node, you lose all the monitoring on this node. Why to do this, when it's very critical to monitor this node???

    I think I implemented the perfect solution.
    An ECS circuit is active on the management server and detects a msg storm caused by a node. As automatic action, a custom created script is run, which finds the template that caused the message storm and deactivates this template on the node. When the template (or templates) is (are) deactivated, a msg is sent to OVO (with opcmsg) that informs you that template "xyz" was deactivated on node "qwe" and the name of the condition (on this template) that caused the problem.
    This is how you can disable the source that is causing your msg storm and at the same time solve the problem (you know the template name and the condition that caused the msg storm).

    I am saying all these because I have nodes with 30 application logfiles. If a msg storm is created, how can I know how to solve it for next time it occurs, while keep on having the monitoring provided by the "healthy" templates.

    kind regards,
    alexk
Reply
  • Verified Answer

    Hi to all,

    The whitepaper and everything are nice, but there is one major problem. When you shut down the agent on the managed node, you lose all the monitoring on this node. Why to do this, when it's very critical to monitor this node???

    I think I implemented the perfect solution.
    An ECS circuit is active on the management server and detects a msg storm caused by a node. As automatic action, a custom created script is run, which finds the template that caused the message storm and deactivates this template on the node. When the template (or templates) is (are) deactivated, a msg is sent to OVO (with opcmsg) that informs you that template "xyz" was deactivated on node "qwe" and the name of the condition (on this template) that caused the problem.
    This is how you can disable the source that is causing your msg storm and at the same time solve the problem (you know the template name and the condition that caused the msg storm).

    I am saying all these because I have nodes with 30 application logfiles. If a msg storm is created, how can I know how to solve it for next time it occurs, while keep on having the monitoring provided by the "healthy" templates.

    kind regards,
    alexk
Children
No Data