Last Access Time not being updated by vCenter Topology job

Hello,

We're currently experiencing which i think is an unusual behavior regarding the "last access time" attribute on CIs.

Since one week a lot of CIs are not being updated (i mean this attribute and also the "last discovered time" attribute since there is no change on these CIs) , despite still existing on our vCenter for example, therefore the aging mechanism starts to kick in (candidate / deletion time set to 3 / 7) and we're losing lots of CIs.

I've tried restarting multiple times vCenter Topology by VIM jobs on our vCenters, i can see all CIs being added to the vector/results in communication logs but nothing is updated on the CI,

I've also tried reducing the "touch time" to 12H according to this post but no luck either : /it_ops_mgt/ucmdb/f/sws-cms_sup/86496/ucmdb-support-tip-discovery-not-updating-ci-attributes---last-access-time-and-last-discovered-time

Also concerned nodes are not being accessed through host connection jobs, so we need them to be updated by the vCenter at least.

We're on 2019.11.

Thanks for your help,
Best regards,
Yann Pingot

Parents
  • There is a touching mechanism, which exists on the probe side. It is enabled by default. Basically what it does is to check if there is a change in the CIs from the last synchronization period and if the CI is not changed, it is added to a touch queue, which is pushed to UCMDB once per day. The touching is basically update of the Last Access Time. 

    Unfortunately I have found several bugs with the touching mechanism, specifically when the CIs come from Integrations. Opening cases resolved them, but patches have to be installed. All should be running fine in version 2019.11. 

Reply
  • There is a touching mechanism, which exists on the probe side. It is enabled by default. Basically what it does is to check if there is a change in the CIs from the last synchronization period and if the CI is not changed, it is added to a touch queue, which is pushed to UCMDB once per day. The touching is basically update of the Last Access Time. 

    Unfortunately I have found several bugs with the touching mechanism, specifically when the CIs come from Integrations. Opening cases resolved them, but patches have to be installed. All should be running fine in version 2019.11. 

Children
  • This is what i thought, it seems that reducing the touch time to 12H helped.

    Should it be working on all discovery/integrations jobs ? Running manually a host ressource by shell job on a single target for example directly updates the last access time when the job is done.
    On the other hand the attribute is only updated on nodes when the touch is triggered for a vCenter Topology discovery.
    Maybe this is false analysis on my side and this is because there is almost always something being updated on a node (nodeElements are taken into account I presume for a node update).

    So if i resume correctly :

    - If something is changed for a node its last access time is directly updated.
    - If nothing is changed but the node is still existing it's added to the "touch queue" and thhe touch is triggered every 24H (if not modified).

    Best regards,
    Yann Pingot

  • Yes, your resume is correct. What you can do is to connect to the postgresql on the probe and see if the CI of vcenter is in the touch queue table. To make the test easier, you can also clean the DFP db with the tools script - cleandataflowprobe. If the record exists, then you have to check if it fails when being sent to ucmdb. If it doesn't exist, you have to check why it fails inserting it.

    Petko

  • Hello Yann,

    there may be also some performance issues for this job as it can bring a lot of CIs.
    Can you paste here the <parameter> section of the CommLog?

  • Hi Bogdan,

    Here you go :

    <params>
    <param param_name="reportLUN" param_value="false" />
    <param param_name="ignoreHostsWithoutHostNames" param_value="false" />
    <param param_name="JOB_ID" param_value="VMware VirtualCenter Topology by VIM" />
    <param param_name="reportBasicTopology" param_value="false" />
    <param param_name="reportDiscoveredOsName" param_value="false" />
    <param param_name="discoverUnknownIPs" param_value="true" />
    <param param_name="runInSeparateProcess" param_value="true" />
    <param param_name="remoteJVMArgs" param_value="" />
    <param param_name="reportPoweredOffVMs" param_value="true" />
    <param param_name="taskType" param_value="regular" />
    <param param_name="reportLayer2connection" param_value="false" />
    <param param_name="maxThreadRuntime" param_value="900000" />
    <param param_name="remoteJVMClasspath" param_value="%minimal_classpath%;../lib/axis.jar;../lib/axis-jaxrpc.jar;../lib/axis-wsdl4j.jar;../runtime/probeManager/discoveryResources/vmware/vim.jar;../runtime/probeManager/discoveryResources/vmware/vim25.jar" />
    </params>

    This vCenter contains less than 3000 VMs, it's usually in warning with a message like below but i don't think it's a big deal, however it fails sometimes (quite rarely nowadays) with some reconciliation timeouts (bulk is too big even with tweaks to the fuse / rates) :

    Les CI suivants ont été ignorés : From Class [usage] 5 CIs were ignored due to Link with ignored end; From Class [interface] 5 CIs were ignored due to Multiple Match;

    Best regards,
    Yann Pingot

  • Hello Yann,

     

    this is what I run on my lab

     

    <params>
    <param param_name="reportFCHBA" param_value="false" />
    <param param_name="reportLUN" param_value="false" />
    <param param_name="ignoreHostsWithoutHostNames" param_value="true" />
    <param param_name="JOB_ID" param_value="VMware VirtualCenter Topology by VIM" />
    <param param_name="reportBasicTopology" param_value="false" />
    <param param_name="reportDiscoveredOsName" param_value="true" />
    <param param_name="discoverUnknownIPs" param_value="true" />
    <param param_name="runInSeparateProcess" param_value="true" />
    <param param_name="remoteJVMArgs" param_value="-Xms64m -Xmx8192m -XX:MaxMetaspaceSize=1024m" />
    <param param_name="reportPoweredOffVMs" param_value="false" />
    <param param_name="taskType" param_value="regular" />
    <param param_name="reportLogicalVolume" param_value="false" />
    <param param_name="reportLayer2connection" param_value="true" />
    <param param_name="maxThreadRuntime" param_value="2800000" />
    <param param_name="remoteJVMClasspath" param_value="%minimal_classpath%;../lib/axis.jar;../lib/axis-jaxrpc.jar;../lib/axis-wsdl4j.jar;../runtime/probeManager/discoveryResources/vmware/vim.jar;../runtime/probeManager/discoveryResources/vmware/vim25.jar" />
    </params>

     

    The bolded part is relevant as it will set strict limits for the remote process JVM that will handle the actual discovery. For my scenario the 8GB max memory is tailored for a 4k VM env.

    If you have multiple tigger CIs (vmware_virtual_center) try to limit the job at only 1 MaxThread from the adapter. Check also if you have duplicate vCenter CIs based on their InstanceUUID or related IP. This can cause duplicate discovery which will overwhelm the reconciliation engine.

    I will truy to post this week an interesting scenario with the Docker veth interfaces. If you ahve Interfaces which are like veth%@% then please delete them as they are short live interfaces linked to the docker pods and they are poisining the reconciliation engine. I am testing a new fix to handle them differently but it's not yet ready.