failover esm destination are receiving some events
we have two ESM in HA and, in our smartconnectors, configured one as primary and other as failover.
When we check the ESM dashboard: "event throughput" we see events in both ESM. So, we check these events in each ESM and are differents events, no duplicated events.
This is a problem for us because we need to maintain the events of each smarconnector in one ESM and, only in case of primary ESM down, use the failover.
Searching problems in our SmartConnector we found a lot of ThreadDumps in our logs folder, but we can identify any problem/error on them.
What can we check to find the source of the problem?
Firstly, I have seen issues a few years ago around the fail-over destinations from a connector. In certain situations, usually in a restart, you can get some events getting sent on to the fail-over when they shouldnt! This was subsequently fixed in later releases, so it would be good to check on this and confirm what the versions are of the SmartConnectors that you have deployed. As standard, I would always recommend upgrading anyway to the latest framework versions, but lets check what is going on here.
Additionally, do check what the dashboards are saying and what they are viewing though. SmartConnectors will generate something like 5-10 events per minute, even at idle, so you will expect to see some numbers on the event flow though this should not show up as EPS (the calculated number is too small), but it will show up as total events received from the connector over the course of a day for example. You need this. I would encourage you to make sure you know what is coming in and what events are being received on a per-connector basis.
The easiest and fastest way to do this is to go to the connector itself (assuming you have management in ESM) and then right mouse click it - select Create New Channel with Filter and it will open a new Active Channel for the data from this particular connector only. Click the part where it shows the date and time and then it will open the edit for this - change the value for 'At Attach Time' to 'Continuously evaluate' and it will refresh the channel constantly for you.
Once you have this (and its a good way to do this), you will see both internal events as well as log events from the sources. Hopefully you are seeing no log events, but you should see internal ones (which are normally not viewed). Thats fine - just confirm what you have and let us know.
As for troubleshooting the connectors, I encourage you to take a look at these:
These will give you a great place to start on connectors to see what is happening. Take a look at some though, and dig into the agent.log files on them. They will give an incredible amount of information around what is happening, what is getting processed and what the potential issues might be. If you are getting thread dumps, that suggests something is going wrong - which causes a re-start and this could be causing some issues around getting the right data to the right destination. So fix this and I am pretty sure we will sort things out.
Thanks for your answer. Our connectors version is 184.108.40.20679.0 but, for now, we can't upgrade it.
Checking our Dashboards we saw, in one of our connectors, the next values:
Primary ESM: 107,6 E/s(avg.) - 9,3M E/day
Failover ESM: 34,3 E/s(avg.) - 3,0M E/day
This are our problem. Every day, our Smartconnector sends 3 million of events to failover ESM. Maybe, ThreadDumps can give some information about the problem, but after read it they don't give us any clue to solve anything.
We tried changing the memory of the JVM but this change had no effect, so ¿Did you think JVM can be the source of the problem? ¿Maybe network issues? or ¿Could be the amount of events per second the cause?
I would absolutely be looking at the events themselves. While there is a difference between a few status messages per minute and 3m events per day, this can easily escalate. For example, restarting connectors, status messages and even device status events for the log sources could quite easily add up to 3m events in a day - for reference, thats only 34 EPS by the way, so you can easily hit this with errors and problems!
You absolutely need to understand what the events that are being received are - if you can see what they are, we can figure out what is going on. Also take a look at the agent.log files for the connectors themselves - you need these to understand what is happening at the connectors. Restarts, issues, communication problems and so on.
The JVM size will make a difference - especially if the connector is loaded or running out of memory. Doing this will improve stability and hopefully stop a lot of the messages. However, we don't know what the messages are, so this is a gamble at best.
Thread dumps are useful in specific situations around problem connectors or a crashing ESM. It allows you to understand what was happening just before or during the problem. They wont really help here though - so dig into the agent.log files for the connectors. Also look into the ESM logs too - so server.log and server.default.log are good ones to start with. While they will be full of lots of other things, try to locate the communications from the connectors and see what is happening.
Finally I found some configuration that helps with this problem. I've added this two lines to agent.properties:
And failover ESM isn't receiving any events.
Thanks to all!