NNMi cancels Connection Down incident even the problem still exists

Typical situation:

-Two Cisco Nexus switches connected via Ethernet/fiber
- LLDP and CDP are running as link layer discovery protocol
- NNMi created the connection using LLDP as topology source (LLDP preferred is the default setting in device profiles)
- “Delete Unresponsive Objects Control” in the discovery setting is set to Zero (never delete object when down)
- Everything nice and green on the map

Now there is a fiber cut on one side (to test I pulled the plug)
- NNMi creates an interface down event for the two affected interfaces, and additionally a connection down incident.
- On the map the two affected nodes turn yellow and the link between turns red, everything as it should…..

10 minutes later I do a configuration poll on either one of the affected nodes, while the fiber is still disconnected. Also I do Map refresh..
-NNMi cancels all previously created incidents. (Incident cancelled by: Connection deleted from topology, incident cancelled by: InterfaceUnpolled)
-On the Map the link between the nodes has been removed and both nodes turned green,  even the problem still exist !
-I did the same test while forcing NNMi to use CDP as topology source, same results.

LLDP holdtime is on Cisco switches is by default 120 Sec. Means 120 sec after a link is down the switch removes it from the neighborship table.
Seems the NNMi comes to the conclusion that because a neighbor is not seen in the LLDP or CDP table is not existing anymore and therefore can be removed, this conclusion would be of course completely wrong.

The Problem is so obvious that I thought it must have something to do with our individual settings….. I checked everything in my mind but so far no Idea

I also opened a case (5317849227) on high prio, something which I hardly ever do, but this really affects the monitoring. A connection down during night was “cleared” by the scheduled configuration poll and in the morning the problem was still around while NNMi showed everything green 

Running latest 10.21 Patch2 on Windows 2012 using latest device pack.

Just wonder if somebody else had such an experience ?

Thanks Thomas

  • Hi Thomas,

    After chatting internally about the problem you are facing, we think it could be related to a known issue for which a HotFix got delivered on top of 10.2xP1 (TB-NNMI-10.2XP1-DISCOVERY-20170303) but the related fix may not be integrated in P2.
    I can see a disco hotfix has been recently released for 10.2xP2: HF-NNMI-10.2XP2-DISCOVERY-20170308. It may have the fix for it.

    For one of the instance of the issue, NNMi was creating FDB connections on a LAG interface while the members had LLDP already on them, and the problem was that the LLDPs connections were being removed, then the FDBs were created and then the LLDPs ones were coming back....

    If this is a similar scenario that happens in your case, you may want to place the devices into a node group and then configure the nodegroup to not be FDB polled in the discovery configuration, as a workaround for this issue.

    Please feel free to share these informations with the engineers working on the case you have just raised on the issue, in case it would help speeding up the problem resolution.

    The internal reference for the related lab cases opened around similar issues are QCIM1B151316 and QCIM1B150980.

    I hope this will help.

    All the best


  • Hi Marie-Noelle

    In my case it is a single interface fType =ethernetCsmacd and ifSpeed=40 Gbps. connection and not a ieee8023adLag channeled one.

    So I tested it again:

    -Deleted involved nodes from NNMi -Added them via seed IP again.

    -Added the their Node Group to disable FDB in the discovery settings

    -Did a configuration poll on both of them (Topo Source was always LLDP)

    -went to the lab and pulled the fiber connection again

    Same result, connection turned red but after a configuration poll on one of the nodes the connection was deleted and everything went green

    I will share that Information in the case….

    Thanks Thomas

  • Verified Answer


    Problem was that NNMi deleted and recreated LLDP discovered eth. L2 connections (UUID changed)  every time a configuration poll was done.  If link was down during that time it was not recreated but deleted and with it also all related incidents which made monitoring very unreliable. 

    Thanks Thomas

    Hotfix-NNMI-10.2XP2-DISCOVERY-20170314 has solved the problem