Support Tip: Troubleshooting VMware events collection problems in Cloud Optimizer

0 Likes

This document provides tips for troubleshooting and solving the most common VMware event collection problems in Cloud Optimizer (COpt). It refers to the tools pvdump and pvsupport available for download in the troubleshooting toolkit PVTK .


How to check which events have been collected

Use the tool pvdump to list the actual collected VMware events:

# /opt/OV/contrib/PVTK/pvdump -events
# /opt/OV/contrib/PVTK/pvdump -csv -events

The option -csv generates the output in a format appropriate for loading into a spreadsheet for analysis:


Event collection interval and time window

Cloud Optimizer collects VMware events at every collection interval. The collection interval is the same as for metrics collection and is configured in the setting [pvcd.vcenter]CollectionInterval:

# ovconfget pvcd.vcenter CollectionInterval
300

The value of this setting is expressed in seconds and defaults to 300 if not set.
Valid values are 300 and 900. Any other value will default to 300.

When starting the Cloud Optimizer 3.04 processes, it will collect VMware events that were created on the vCenter after the Cloud Optimizer processes were started. Events that were created while the processes were stopped are not collected.

When starting the Cloud Optimizer 3.10 processes, it will collect VMware events that were created on the vCenter up to one hour before the Cloud Optimizer processes were started. This enables to stop Cloud Optimizer for a period of up to one hour without losing VMware events. The creation time and event ID of the last collected event are memorized so as to avoid duplicate collection.


Configuring the event types that should be collected

Cloud Optimizer (COpt) collects the events of types that are listed in the file:

# /opt/OV/newconfig/OVPM/smepack/VCENTER/data/VIEventTypes.cfg

By default, this file is populated with a few common types of events, but some of these are commented out.
Usually, some additional event types are desired and should be added to the file or uncommented.
A common type that one may want to collect is AlarmStatusChangedEvent.
In order to enable the collection of this type of event, this entry must be uncommented from the file.

Note that, if this file does not list any event type or if the file does not exist, then COpt will collect events of any type. This can generate a large volume of collected events and would practically only be used temporarily to create a sample of events. This sample can be dumped into a CSV file, using the tool pvdump:

# /opt/OV/contrib/PVTK/pvdump -csv -events

This sample can then be loaded and analyzed in a spreadsheet to identify the events of interest and populate the file VIEventTypes.cfg accordingly.


Events filtering based on object type

Once an event is collected, Cloud Optimizer identifies the object to which this event relates (e.g. a VM, a Host, a Datastore...). When the event is stored in the database, it will be associated with this related object.

Events that relate to an object type that is not collected by COpt, e.g. VMware users, are dropped by default.
If all collected events should be preserved, then one should use the below setting:

# /opt/OV/bin/ovconfchg -ns pvcd.vcenter -set LogUnmatchedEventsToTarget true

This will ensure that events related to an object type that is not collected by Cloud Optimizer will be preserved, stored in the Cloud Optimizer database and associated with the vCenter itself.


Event collection batch size

This new variable was introduced in Cloud Optimizer 3.10 (and in some site specific versions of Cloud Optimizer 3.04). It determines the batch size of collected events for each collection interval and must be initialized.
The ideal value depends on the environment, but a good starting point is somewhere between 20 and 200:

# /opt/OV/bin/ovconfchg -ns pvcd.vcenter -set BatchSizeofEventsPerTarget 100

If this setting is missing, the logfile /var/opt/OV/log/status.virtserver may show entries like:

ERROR [2021-04-13 19:50:59,748] : Thread[pool-119-thread-1,5,main] EVT: While reading the next events 
      at com.sun.proxy.$Proxy37.createCollectorForEvents(Unknown Source)
      at com.hp.virt.vcenter.VIEventCollector.CollectEvents(VIEventCollector.java:382)
      at com.hp.virt.vcenter.EventTask.run(VIvCenterTasks.java:524)

After some time, the entries change to:

ERROR [2021-04-14 16:33:22,940] : Thread[pool-72-thread-1,5,main] EVT: While reading the next events com.vmware.vim25.InvalidStateFaultMsg: The operation is not allowed in the current state.
      at sun.reflect.GeneratedConstructorAccessor424.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

If the latter error is shown, then it is necessary to restart the collector after first configuring a value for this setting as the maximum number of open collectors per connection will be exhausted on the vCenter due to the retries:

# /opt/OV/contrib/PVTK/pvsupport -restart pvcd

System clock and event ID synchronization

Cloud Optimizer collects events from the vCenter based on a sliding time window.
It is important that the system clock of the Cloud Optimizer server and the vCenter are synchronized.
If they are not, events can be partially or completely missed.
After synchroning the clocks, it may be necessary to reset the sliding time window on the Cloud Optimizer server:

# ovconfget pvcd.vcenter | grep LastEventCollectedTimeStamp

Identify the name of the setting that refers to your vCenter and clear it.
For instance:

# ovconfchg -ns pvcd.vcenter -clear LastEventCollectedTimeStamp__vcw701.gale7.net

Cloud Optimizer also memorizes the event ID of the last collected event.
While this has not been observed so far, it might be possible that this event ID becomes out of synch with the vCenter after some maintenance. To reset it on COpt, use:

# ovconfget pvcd.vcenter | grep LastEventIDProcessed

Identify the name of the setting that refers to your vCenter and clear it.
For instance:

# ovconfchg -ns pvcd.vcenter -clear LastEventIDProcessed__vcw701.gale7.net

Note that clearing these settings could result into duplicate collected events over the last one hour period. 


Event collection logs

The event collection logs are written to /var/opt/OV/log/status.virtserver.
To isolate the entries related to event collection, search for the keyword EVT:

# grep EVT /var/opt/OV/log/status.virtserver

It may be useful to increase the logging level to TRACE:

# /opt/OV/contrib/PVTK/pvsupport -trace vidaemon:trace:100MB


Below log entries can be used as anchoring points to analyze the trace:

INFO  [2021-06-25 12:06:03,447] : Thread[ViVcenterCollector-3045,5,main] EVT: Collecting and sending events for vcw701.gale7.net

The above line indicates the start of new event collection for a vCenter.

INFO  [2021-06-25 12:06:03,593] : Thread[pool-1517-thread-1,5,main] EVT: Starting to collect events in EventCollector from: Fri Jun 25 11:10:54 CEST 2021and end time is: Fri Jun 25 12:10:54 CEST 2021for vcenter: vcw701.gale7.net

The above line provide the sliding time window for which events will be collected during this interval.

INFO  [2021-06-25 12:06:05,270] : Thread[pool-1517-thread-1,5,main] EVT: Total No of events collected in this interval is 3

The above line provides the number of events that have been collected in this interval. These are the events created during the sliding time window that are of a type listed in VIEventTypes.cfg.

INFO  [2021-06-25 12:06:05,798] : Thread[pool-1517-thread-1,5,main] EVT: New event, Processing it 383413
INFO  [2021-06-25 12:06:05,798] : Thread[pool-1517-thread-1,5,main] EVT: Event name is: VmPoweredOnEvent
INFO  [2021-06-25 12:06:05,798] : Thread[pool-1517-thread-1,5,main] EVT: VM: MOR details is: com.vmware.vim25.ManagedObjectReference@2a952d4aand vm name is: vm704mor value isvm-131
INFO  [2021-06-25 12:06:05,798] : Thread[pool-1517-thread-1,5,main] EVT: VM:  Getting the morvalue from the m_morToMorValueMapvm-131
INFO  [2021-06-25 12:06:05,798] : Thread[pool-1517-thread-1,5,main] EVT: VM: Got the UUID from the prop map: 421cd206-3ab4-1f3e-55e7-b259d81e487bfor vm name: vm704

The above line shows the processing of one collected event. Cloud Optimizer identfies the object related to this event (in this case a virtual machine with the display name vm704) and tries to match it with the objects collected by Cloud Optimizer.

INFO  [2021-06-25 12:06:05,908] : Thread[pool-1517-thread-1,5,main] EVT:  interesting events found by EventCollector: 2

The above line provides the number of events that are related to an object that is collected by Cloud Optimizer. These events will be associated with their related object and written into the Cloud Optimizer database.


Problems on the vCenter itself

There can be problems on the vCenter itself that prevent Cloud Optimizerfrom collecting the events.
To evidence such a problem, it can help to take Cloud Optimizer out of the loop and use pure VMware tools that can highlight the problems encountered by Cloud Optimizer when trying to collect the events.

The powershell script eventscollector.ps1 uses steps similar to Cloud Optimizer for collecting all events logged on the vCenter during the last two hours. Run it from a VMware-enabled powershell:

# Install-Module -Name VMware.PowerCLI
# Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false
# connect-viserver -server [VCENTER_FQDN] -user [USER]
(Provide the password)
# eventscollector.ps1 2>&1 | Tee-Object -FilePath eventscollector.log

Specify the same user that was configured in Cloud Optimizer to connect to the vCenter.
This username can be found from this command:

# ovconfget pvcd.vcenter Targets

The script will collect the events logged on the vCenter during the last 2 hours and print them to the terminal.
If no events are printed, then you have reproduced the events collection problem using pure VMware commands.
You can try again with a different user to check if the problem results from user privileges, or you can run it connected to another vCenter to compare.
The next step would probably be to contact VMware support.

You can also view the full support tip here

Labels:

Support Tip
Comment List
Anonymous
Related Discussions
Recommended