Welcome Serena Central users! CLICK HERE
The migration of the Serena Central community is currently underway. Be sure to read THIS MESSAGE to get your new login set up to access your account.
Karl2 Honored Contributor.
Honored Contributor.
664 views

ArcSight ESM Distributed Correlation: Consumption Stopped

Dear All,

 

We're using ArcSight ESM 7.0 P2 in Distributed Mode (aka Distributed Correlation),

We've been testing for some months and were running fine, but we've realised about an unexpected issue when increasing the input EPS (integrating more data sources)

clipboard_image_3.png

 

As can be seen, the event consumption is automatically stopped and an internal message: "Mbus event consumption is paused" is produced. This hits directly our ingestion.

In order to avoid that we explicitly deactivated the "Backpressure Mode" from auto to off, this let us ingest more events (so far, up to 22K) and agents stopped caching heavily, even though, those messages are still being produced. I'm afraid that this hits to the proper consumption (and then also to the processing) of the events received on the ESM.

 

By any chance, have you found this issue?

Any thoughts are welcome. The cluster size is based on small-medium recommended configurations from ArcSight ESM guides.

 

BTW, we've also included the following settings on the server properties to allow higher EPS input:

#When using ESM in distributed correlation mode, the default capacity of internal buffers
#that temporarily store incoming events might limit persistence throughput.
queue.logger.pre-security-event-persistor.capacity=200000
queue.logger.start-of-flow.capacity=200000

 

We've also opened a ticket with support (SD02488625) but no progress during the last months, 

So I'm asking for any kind help on this topic,

 

Thanks in advance,

 

Regards,

 

Karl.

Labels (1)
0 Likes
15 Replies
Micro Focus Expert
Micro Focus Expert

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hey @Karl2 

I wanted to write a more extensive post to this topic on which angles we could check, and how to perform some more troubleshooting, but before that, could you please provide or update your initial post with the specs that you are currently running?
What I am specifically looking for is:

1. Virtual or physical?

2. Latency between persistor and its nodes, especially if virtual and they are running on different vmware nodes.

3. CPU/MEM and is it SSD?

4. Please let me know how you have distributed your correlators/aggregators and mbus'es over the nodes 🙂

After that i'l try to respond with a better way of troubleshooting it, just wanted to ensure first it's actually running with enough power.

-----------------------------------------------------------------------------------------
All topics and replies made is based on my personal opinion, viewpoint and experience, it does not represent the viewpoints of MicroFocus.
All replies is based on best effort, and can not be taken as official support replies.
//Marius
0 Likes
Karl2 Honored Contributor.
Honored Contributor.

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hi Marius,

 

First of all, thanks for replying.

 

Answering to your questions:

 

1. Virtual or physical? :

Alll my servers are VMs.

 

2. Latency between persistor and its nodes, especially if virtual and they are running on different vmware nodes. : A low-latency, high-bandwidth connection ensured by our hosting provider, no problems in this direction.

 

3. CPU/MEM and is it SSD? 

workers: 16 cores and 128 RAM.

persistor: 16 cores and 256 GB RAM.

Initial deployment of 3 nodes (2 workers and 1 persistor), following the small configuration recommendations. OS and Data (/op/arcsight) FS based on SSD.

4. Please let me know how you have distributed your correlators/aggregators and mbus'es over the nodes 🙂

Sure, it is distributed 🙂 :

 

$ /opt/arcsight/services/init.d/arcsight_services statusByNode
Node WORKER01: available
aggregator1(ws001_aggregator01): available
correlator1(ws001_correlator01): available
dcache2: available
mbus_control2: available
mbus_data1: available
repo2: available
Node PERSISTOR: available
aps: available
dcache1: available
execprocsvc: available
logger_httpd: available
logger_servers: available
logger_web: available
manager: unavailable
mbus_control1: available
mysqld: available
postgresql: available
repo1: available
Node WORKER02 available
aggregator2(ws002_aggregator01): available
correlator2(ws002_correlator01): available
dcache3: unavailable
mbus_control3: available
mbus_data2: available
repo3: available

In this environment we can upgrade the underlying resources as needed, however I need to know how to size properly those servers; i.e. if my aggregators are exhausted increase the memory by a factor of "x", but here is the problem is revealed when we pretend to ingest >4K of events. Only disabling the backpressure mode helped to increase the input rate, even though, the messages highlighted reveals a problem that should be addressed.

 

I believe that this HW and ESM Version should be able to handle >15K with np (in theory from 75K up to 100K) but I need to figure out the proper settings and sizes.

 

Any inputs are welcome,

 

regards,

 

Karl.

0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hello @Karl2 .

The issue with the sizing guides is that they are more a general rule of thumb. The issue with creating sizing guide for something like ESM is that there is too many factors to take into consideration. Anything from memory quality, speed of disks, quality of the rules and content created and of course how much EPS.

I would like to recommend a bit of a different approach, I hope you don't mind but i feel that oversizing single nodes might work against you in this case, and since you have the possibility to scale upwards, I think you might want to try a bit of a different layout, with 3 workers and 1 persistor.

You might know all or some of this already but:

The reason for this is just the principle on how Distributed Correlation works. All work between the persistor and its' nodes are managed by the kafka message bus. When an event is inserted into the persistor, based on the rule and content it hits it will be added to one or more kafka topics. Normally called something similar to "p to c, p to a" etc, which just means Persistor to Aggregator, or Persistor to Correlator.

In many cases the size of the single node is less relevant compared to how many message busses is available, and how many single actions can be performed at any given time (depending on how many nodes you have available). Your current setup leaves you at certain disadvantages:

1. You only have 1 repo, meaning that the online status of all your nodes and services are managed by a single service instead of 3. This can either be 1 or 3, and I would really recommend 3.

2. It's commonly good to have a odd number of services, to prevent split brains, meaning that the other services should have 1, 3, 5, 7 etc amount of services.

What you might be seeing is that even though the nodes might have a large size, but if the amount of data cannot be computed at a faster rate than the ingestion, then you will always end up with full message busses.

I would of course consider the possibility that the reasons might just as well be bad content (large joined rules, velocity expressions, heavy data monitors, massive active-lists etc), but let's look at that at a later date.

In the end my current recommendation would be:

Add a fourth node to your cluster, then ensure that you have the same amount of services as mentioned in the medium sizing guide, though ignoring the RAM/CPU numbers, just focusing on the services, this should leave you with:

No need to reduce the sizing of your current nodes if you don't want to of course.

Node 1 - 64GB RAM - 8/16 Cores:
persistor with a built-in distributed cache
one information repository


Node 2 - 64GB RAM - 8/16 Cores:
one correlator
one aggregator
one distributed cache
one message bus control
one message bus data

Node 3 64GB RAM - 8/16 Cores:
one correlator
one aggregator
one message bus control
one message bus data
one information repository

Node 4 64GB RAM - 8/16 Cores:
one correlator
one aggregator
one distributed cache
one message bus control
one message bus data
one information repository

As you can see, while you might be having less resources on each node, but your numbers of message bus services, repo and corr/agg is increased.

 

After this is setup and running, you have a good standard on a full cluster, that should be able to scale quite well, some rules of thumbs here:

Correlators require more CPU while aggregators require more memory. If your kafka topics towards aggregations has a high queue count, then adding a second aggregator somewhere while increasing the memory might be a good solution, same the other way around for correlators.

While adding new nodes and services, you might also need to increase the memory of the persistor, but no need to increase something before it's used, but its good to have in the back of your mind.

Though if your resources keep increasing, then it might just be more content related instead.

When you are on 3 workers is normally the time you can start scaling each node, which is why i start as low as 64GB, and that is why i prefer 3 workers over 2.

If memory is low try to increase them in intervals of 16GB each time, and cores with 4-6.

I hope this is a good enough initial description, and if you decide to go the way of 3 workers, feel free to come back to the topic and let us know if it worked out, if not we can continue and look at other aspects like some performance debugging on content and other possible issues with the setup.

-----------------------------------------------------------------------------------------
All topics and replies made is based on my personal opinion, viewpoint and experience, it does not represent the viewpoints of MicroFocus.
All replies is based on best effort, and can not be taken as official support replies.
//Marius
Karl2 Honored Contributor.
Honored Contributor.

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hi Marius,

 

Thanks for the detailed response, 

sure, I can give I try adding a node and removing some services from the persistor box,

while oversizing the cluster may help I'd really like to know the best approach to add nodes and which key services can be tunned/add to address the issue described.

 

Just one thing to highlight, I don't have 1 repo but 3 actually configured :

$ /opt/arcsight/services/init.d/arcsight_services statusByNode | grep repo

repo2: available

repo1: available

repo3: available

 

However, from the ACC I can see just one as active:

clipboard_image_1.png

Is this actually expected?

I've reviewed the repo logs from the boxes but no issue (to my eyes) was found.

 

Thanks for all the help!

 

regards,

 

Karl.

0 Likes
Micro Focus Expert
Micro Focus Expert

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Ah okay, seems like that must be a bug in the web interface then. I usually have seen mine as all online, but as long as it is shown as online on the CLI then that should be fine.

I think the tuning I described earlier will help you with the issue you specified above, mostly because it's the amount of services that I see as an issue.

If just a growing infrastructure is an issue, meaning that the resources you need is more than what you have, and you need to scale, you first need to pinpoint what type of resource you need (aggregation, correlation or dcache, the last one is normally not needed to extend more than 3).

Then you can easily add more nodes if you want to keep each server with less resources, or add more services on existing servers if you rather increase the resources on them instead.

 

-----------------------------------------------------------------------------------------
All topics and replies made is based on my personal opinion, viewpoint and experience, it does not represent the viewpoints of MicroFocus.
All replies is based on best effort, and can not be taken as official support replies.
//Marius
Karl2 Honored Contributor.
Honored Contributor.

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hi Marius,

I've tried to add another node to the cluster, the configuration went fine but when trying to start again the entire cluster, I faced that the aggregators and correlators from the new node are always unavailable.

Reviewing those logs, I can find the reason:

...

[2019-11-20 15:29:37,891][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_NODE:8443/arcsight/servlet/XmlRpc
[2019-11-20 15:29:37,903][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_NODE:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-20 15:29:38,403][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_NODE:8443/arcsight/servlet/XmlRpc
[2019-11-20 15:29:38,418][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_NODE:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-20 15:29:38,418][ERROR][default.com.arcsight.server.AggregationServer] Unable to connect to the esm persistence manager
[2019-11-20 15:29:38,421][ERROR][default.com.arcsight.server.AggregationServer]

 

Ok, this is expected: the new worker node is not able to connect to PERSISTOR_NODE because the full ca path is not imported yet into its trustore, so I imported the Full CA on /opt/arcsight/manager/jre/lib/security/cacerts , however, the error persist, even after a full restart.

 

As usual with ArcSight, when trying to fix a thing another get broken 🙂

 

Any ideas?

 

Thanks in advance,

 

regards,

 

Karl.

Karl2 Honored Contributor.
Honored Contributor.

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hi,

 

I've tried re-importing the full ca path into the esm truststores:

 

  • Truststore for Manager (/opt/arcsight/manager/config/jetty/truststore)
  • Clientcerts for Manager (/opt/arcsight/manager/jre/lib/security/cacerts)

 

I believe that it is a must only for the cacerts, but just in case I've done for both stores, 

however, the issue persists:

 

[2019-11-21 00:01:01,020][INFO ][default.com.arcsight.crypto.SSLConfiguration] Loading trust store: /opt/arcsight/manager/jre/lib/security/cacerts|JKS
[2019-11-21 00:01:01,042][INFO ][default.com.arcsight.server.management.ManagementAgent] Registered MBean 'Arcsight:service=CRLManager'.
[2019-11-21 00:01:01,051][INFO ][default.com.arcsight.crypto.SSLConfiguration] Successfully initialized truststore.
[2019-11-21 00:01:01,051][INFO ][default.com.arcsight.crypto.SSLConfiguration] Configured trust managers: [sun.security.ssl.X509TrustManagerImpl@290d10ef]
[2019-11-21 00:01:01,053][INFO ][default.com.arcsight.crypto.h] Initializing enabled SSL protocols: (FIPS mode) with default settings: [TLSv1, TLSv1.1, TLSv1.2]
[2019-11-21 00:01:01,053][INFO ][default.com.arcsight.crypto.h] Initializing enabled SSL protocols: (non-FIPS mode) with default settings: [TLSv1, TLSv1.1, TLSv1.2]
[2019-11-21 00:01:01,117][INFO ][default.com.arcsight.crypto.SSLConfiguration] Successfully initialized SSLContext
[2019-11-21 00:01:01,130][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:01,151][INFO ][default.com.arcsight.common.persist.remote.BoundedResourceProxyCache] Started with policy NoopEntryStatePolicy
[2019-11-21 00:01:01,151][INFO ][default.com.arcsight.manager.XmlRpcSessionContextMapper] No context found for thread main. Creating default context.
[2019-11-21 00:01:01,807][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:02,308][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:02,335][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:02,835][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:02,855][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:03,363][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:03,390][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:03,891][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:03,909][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:04,409][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:04,428][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:04,929][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:04,945][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:05,445][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:05,471][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:05,971][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:05,985][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:06,486][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:06,503][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:07,008][INFO ][default.com.arcsight.manager.XmlRpcManager] Initializing XmlRpc client for https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc
[2019-11-21 00:01:07,023][WARN ][default.com.arcsight.common.services.ManagerRPCChecker] Failed to establish XmlRpc connection to https://PERSISTOR_SERVER:8443/arcsight/servlet/XmlRpc. Exception: com.arcsight.manager.ConnectionException: Error while executing command: Received fatal alert: certificate_unknown
[2019-11-21 00:01:07,024][ERROR][default.com.arcsight.server.CorrelationServer] Unable to connect to the esm persistence manager
[2019-11-21 00:01:07,027][ERROR][default.com.arcsight.server.CorrelationServer]
java.rmi.ConnectException: Manager not reachable
at com.arcsight.server.BaseCorrelationServer.readyToStart(BaseCorrelationServer.java:107)
at com.arcsight.server.CorrelationServer.initialize(CorrelationServer.java:288)
at com.arcsight.server.CorrelationServer.main(CorrelationServer.java:713)
at com.arcsight.server.ServerReferences$CorrelationServerRef.main(ServerReferences.java:27)
[2019-11-21 00:01:07,035][INFO ][default.com.arcsight.server.a3] Shutdown Started.

 

This is quite strange, to my knowledge, all the settings are properly loaded, any hints?

 

thanks in advance for the help provided,

 

regards,

 

Karl.

Micro Focus Contributor
Micro Focus Contributor

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hi, 

How about the result of 

From persistor node ; 
/opt/arcsight/manager/bin/arcsight certadmin -list 
 
Please try to compare cacerts of persistor and issued node you added in new in the perspective of  all which can be displayed from certificate detail of keytool gui. 
 
Error message is different with invalid and I guess format difference as reason. 
 
Regards,
Youngmin
 
Regards,
Young-min Kim

Engineer who has used HP/HPE/MicroFocus/Opsware/Mercury/Arcsight....
0 Likes
Karl2 Honored Contributor.
Honored Contributor.

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hi @Teacher K 

 

Thanks for highlighting that topic!

As part of the setup I approved all the certs (including the new node), but (surprisingly) the new node was only on "Submitted", so, I've proceeded to approve them all again, now all the certs are approved. Nothing Revoked.

 

So, after this change the aggregator and correlator shows no more errors on the aggregator/correlator logs (aggregator.log/correlator.log) but new ones are registered on the aggregator/correlator std logs (aggregator.std.log/correlator.std.log):


[arcsight@WORKER_SERVER03 ~]$ tailf /opt/arcsight/var/logs/correlator4/correlator.std.log
2019-11-21 16:30:17 ARCSIGHT_JVM_OPTIONS: -server -Xms16384m -Xmx16384m -verbose:gc -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/arcsight/var/tmp/correlator4/heapdump -Dcom.sun.management.jmxremote.port=No valid value for JMX port found for instance correlator4. Exiting. -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
2019-11-21 16:30:18 Error: Invalid com.sun.management.jmxremote.port number: No
2019-11-21 16:30:21 ArcSight Correlation Server starting...
2019-11-21 16:30:21
2019-11-21 16:30:28 ARCSIGHT_JVM_OPTIONS: -server -Xms16384m -Xmx16384m -verbose:gc -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/arcsight/var/tmp/correlator4/heapdump -Dcom.sun.management.jmxremote.port=No valid value for JMX port found for instance correlator4. Exiting. -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
2019-11-21 16:30:28 Error: Invalid com.sun.management.jmxremote.port number: No
2019-11-21 16:30:32 ArcSight Correlation Server starting...
2019-11-21 16:30:32
2019-11-21 16:30:40 ARCSIGHT_JVM_OPTIONS: -server -Xms16384m -Xmx16384m -verbose:gc -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/arcsight/var/tmp/correlator4/heapdump -Dcom.sun.management.jmxremote.port=No valid value for JMX port found for instance correlator4. Exiting. -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
2019-11-21 16:30:40 Error: Invalid com.sun.management.jmxremote.port number: No
^C
[arcsight@WORKER_SERVER03 ~]$

 

So, there is something else behind,

I have ensured all the connectivity between the nodes, so no problem on that direction,

 

Thanks for all the help provided,

 

regards,

 

Karl.

Micro Focus Expert
Micro Focus Expert

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

From what i could see referenced in the ticket you mentioned earlier, you did in fact modify the JMX port i believe, or was told to do so? Was that a change that also has to be applied on every worker node? If not then maybe that change has to be reverted on the existing ones?

-----------------------------------------------------------------------------------------
All topics and replies made is based on my personal opinion, viewpoint and experience, it does not represent the viewpoints of MicroFocus.
All replies is based on best effort, and can not be taken as official support replies.
//Marius
0 Likes
Karl2 Honored Contributor.
Honored Contributor.

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hi Marius,

 

Truth, I've also tried (1st) with the default settings, and then with the custom settings ("static rmi port") for mbus, aggregators and correlators, even though, after the change and restarting the whole cluster, the issue persists.

 

Up to this point, I'm not sure what to do, on the initial setup we did not face any issues while adding nodes to the cluster, but once started it seems to be a little tricky

 

regards,

 

Karl.

Micro Focus Expert
Micro Focus Expert

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

To be honest I'm a bit unsure why have had so many issues, including the ones you had before in the ticket + now. 

I have setup at least 10 distributed correlation clusters at this point, and they have been smooth 9 out of 10 times, and the last time was quite quickly fixed.

I wonder if there has been any old changes or upgrade issues in earlier days that might have been coming back now, but at that point i'm just grasping at straws.

I am guessing reinstall of the software is out of the question right, while it's still in development?

If not then the new ESM is coming out in a few days, with a large amount of fixes, maybe waiting to that point and setting it up from scratch on 7.2 instead of going the upgrade path.

Upgrading should not really be an issue, but if one upgrade had issues then surely the next upgrade will also have the same.

-----------------------------------------------------------------------------------------
All topics and replies made is based on my personal opinion, viewpoint and experience, it does not represent the viewpoints of MicroFocus.
All replies is based on best effort, and can not be taken as official support replies.
//Marius
Karl2 Honored Contributor.
Honored Contributor.

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hi Marius,

 

Well, in our case the experience is opposite, 3 full set-ups following the guides step by step and once deployed, we faced issues out of the box that are solved with a couple of configurations (discussed on the ticket).

After the Cluster is working by the time we ingest events at a decent scale (10K, 20K in the example) we began to see the issues described, and when trying to add node, more weird issues.

 

So, based on your experience, what would be the size/number of nodes to achieve 80K - 100K EPS with default content? (I'm aware about the filtering and aggregation, but we're not far from such load)

I plan to follow the guidelines for large deployment, but I wonder about the right settings to achieve such load,

so far, the experience is not the best with this new release,

 

Thanks for all the help,

 

regards,

 

Karl.

Karl2 Honored Contributor.
Honored Contributor.

Re: ArcSight ESM Distributed Correlation: Consumption Stopped

Hi All,

 

We've also realized about some unexpected messages being logged on the persistor server - /opt/arcsight/var/logs/manager/default/server.std.log:

 

2019-12-03 12:34:15 HOSTINFO 1575376455750 1.0 0.0 0.0 99.0 90904 12298 3 0 0 0 0 0 244 0 5605 1721 0 0.0
2019-12-03 12:34:17,908 INFO [http-bio-127.0.0.1-9090-exec-3] - isAuthenticated - missed data for com.arcsight.product.esmclient.service.v1.impl.DataMonitorV2ServiceImpl.getViewableData
2019-12-03 12:34:17,908 ERROR [http-bio-127.0.0.1-9090-exec-3] - Re-throwing otherwise uncaught exception: com.arcsight.coma.bridge.AuthenticationException: Not authenticated for service 'com.arcsight.product.esmclient.service.v1.api.DataMonitorV2Service, method 'getViewableData'
2019-12-03 12:34:17 com.arcsight.coma.bridge.AuthenticationException: Not authenticated for service 'com.arcsight.product.esmclient.service.v1.api.DataMonitorV2Service, method 'getViewableData'
2019-12-03 12:34:17 at com.arcsight.product.esmclient.service.v1.impl.AccessControlledDataMonitorV2ServiceImpl.getViewableData(AccessControlledDataMonitorV2ServiceImpl.java:579)
2019-12-03 12:34:17 at com.arcsight.product.esmclient.service.v1.gwt.server.DataMonitorV2ServiceImpl.getViewableData(DataMonitorV2ServiceImpl.java:709)
2019-12-03 12:34:17 at sun.reflect.GeneratedMethodAccessor822.invoke(Unknown Source)
2019-12-03 12:34:17 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2019-12-03 12:34:17 at java.lang.reflect.Method.invoke(Method.java:498)
2019-12-03 12:34:17 at com.google.gwt.user.server.rpc.RPC.invokeAndEncodeResponse(RPC.java:587)
2019-12-03 12:34:17 at com.google.gwt.user.server.rpc.RemoteServiceServlet.processCall(RemoteServiceServlet.java:333)
2019-12-03 12:34:17 at com.google.gwt.user.server.rpc.RemoteServiceServlet.processCall(RemoteServiceServlet.java:303)
2019-12-03 12:34:17 at com.google.gwt.user.server.rpc.RemoteServiceServlet.processPost(RemoteServiceServlet.java:373)
2019-12-03 12:34:17 at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62)
2019-12-03 12:34:17 at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
2019-12-03 12:34:17 at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
2019-12-03 12:34:17 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
2019-12-03 12:34:17 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
2019-12-03 12:34:17 at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
2019-12-03 12:34:17 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
2019-12-03 12:34:17 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
2019-12-03 12:34:17 at com.arcsight.coma.servlet.ServerContextFilter.doFilter(ServerContextFilter.java:56)
2019-12-03 12:34:17 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
2019-12-03 12:34:17 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
2019-12-03 12:34:17 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:218)
2019-12-03 12:34:17 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
2019-12-03 12:34:17 at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:506)
2019-12-03 12:34:17 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
2019-12-03 12:34:17 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
2019-12-03 12:34:17 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
2019-12-03 12:34:17 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:445)
2019-12-03 12:34:17 at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1087)
2019-12-03 12:34:17 at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637)
2019-12-03 12:34:17 at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:318)
2019-12-03 12:34:17 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2019-12-03 12:34:17 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2019-12-03 12:34:17 at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
2019-12-03 12:34:17 at java.lang.Thread.run(Thread.java:748)
2019-12-03 12:34:19 Memory Status: 1,385.6 MB Used, 15,993.0 MB Max

 

Even though, the CLI shows all the services as available but not the ACC, any ideas?

 

Regards,

 

Karl.

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.