Metric files coso on NNMi Server are not processed anymore by OMT-Container performance solution

Metric files on NNMi Server are not processed anymore by OMT-Container performance solution

NNMi 2023.4
OMT234-190-15001
Vertica 12.0.4-10 embedded
RH Linux OS

We have the problem, that the coso metrics are not processed anymore after approximatly 2 days of runtime, resulting in the NNMi Message:
“The Performance SPI Custom Poller Bus Adapter has status Warning because the average input queue duration is between…
Restart of the whole container solves the issue temporary.

on March 8th The last file was written to the reprocess directory at 3:12 in the morning.
At 11:35 p.m. on March 7th. The following message appears in the log for the first time

Line 67888: 2024-03-07 23:35:13.825 WARN [org.apache.pulsar.client.impl.ConnectionPool] (pulsar-client-io-397-1) Failed to open connection to <MASKED CUSTOMERS SYSTEM>:31051

Does anybody have the same problem and knows how to solve that permanently.

  • 0  

    Hallo Gero,

    Were all pods running before the restart?

    Did you reboot a master or worker node without following the instruction to first stop all manually?

    Did you perform some connection check (e.g. port connection check to 31051 or re-enable with nnmksutil.ovpl)?

    BR Allessandro



    Allessandro Soloperto
    ITC GmbH - Senior Consultant
    If this answered your question, please mark it as "Suggest as Answer" or "Verify as Answer".
    If you found this response useful, please give it a "Like".

  • 0 in reply to   

    Hi,

    the processes had been stopped manually before rebooting, connection check from NNMi to container port 31051 fails, reenabling with nnmksutil.ovpl does not make that sense to me, because after reboot of the container the metric processing works again for approx. two days.

    The processes below are constanty crashing, but also when the metrics are processed, and for those processes there is another vendor case open. 

    kubectl get pods -n nom1 -o wide
    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    bvd-
    bvd-redis-7dbb7cf88-c6ft5 3/3 Running 110 (3m17s ago) 25h 172.16.0.136 12345.domain <none> <none>
    bvd-itom-di-postload-taskcontroller-6b96f789d5-n5qqq 1/2 Running 1 (24h ago) 24h 172.16.0.154 12345.domain <none> <none>
    itom-idm-6c5cfbc47d-5bnd6 1/2 Running 75 (10m ago) 25h 172.16.0.142 12345.domain <none> <none>
    itomelemetry-collector-0 1/2 Running 298 (3m49s ago) 24h 172.16.0.168 12345.domain <none> <none>


    uif-t@an-vnnmm-p01 ~]# kubectl get pods -n core -o wide
    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    aler-idm-559898b56-bjt77 1/2 CrashLoopBackOff 95 (3m42s ago) 25h 172.16.0.123 12345.domain <none> <none>
    itom

    Any more ideas how to go on here?

  • 0   in reply to 

    Hi,

    no, perhaps the OT support has an idea.

    BR Allessandro



    Allessandro Soloperto
    ITC GmbH - Senior Consultant
    If this answered your question, please mark it as "Suggest as Answer" or "Verify as Answer".
    If you found this response useful, please give it a "Like".