Did you know that ESM does Connector & Device Monitoring?
Its well know that ESM does have a level of monitoring of both SmartConnectors and log sources and that its been relatively poorly explained or covered in the past. However, I wanted to address some of this and also to cover off what does work as standard.
I am not going to dig into the exact mechanisms that are used, mainly because its actually being replaced by ArcMC as a management source for this type of scenario. In fact, please see my other post on this very same subject:
To summarize though, ESM can do it and it will still do it in the future (the functionality isnt going away), but ArcMC is where the effort, time and functionality will be spent going forward. Therefore I really recommend moving towards this for your SmartConnector and log source monitoring.
Firstly, lets address a few points though:
- SmartConnectors can be managed by many different mechanisms and while it works with ESM, the recommendation is to use ArcMC going forward.
- Device status monitoring is the term used for monitoring data sent from log sources - so think of it as monitoring if the log sources are sending logs - AKA device status monitoring
- There is a lot of content available for this, but I will cover off what is available as standard
You may or may not be aware that Activate is the framework that we all should be using for driving content forward in a simple to author, reuse and distribute format. There are actually a number of packages provided FREE OF CHARGE for system and device monitoring and you really should use them:
You will need the base package for Activate, then the base package for the system monitoring and then the connector monitoring package. See the links above, but this is pretty simple and straightforward. Also, please read the instructions on installing the packages, as incorrect installation will overwrite any customizations that you have done. Also, please note that this is CONNECTOR management and monitoring and not device status (log source).
When you have installed it (see the instructions from the links above), you will find a bunch of content ready for use. Activate framework packages are to be taken as a starting point for content and are built to make it easy to extend. However, what most people want is a dashboard to start with - so there are a bunch of dashboards already present:
Taking a closer look at the dashboards you can see the Connector Version Detail dashboard below. This is just a breakdown of the connectors installed, the versions and overall status. Useful from an overview of the versions in place and what they are doing.
Next up is the Connector Configuration Overview dashboard. This is a simple table format one that covers off logical errors or problems with the connectors being monitored. Here we can see status, parser errors (and which ones - please note that this is from SmartConnector framework release 7.3 or later) and any outstanding event issues that the connectors have:
Next up is the ASM Connector overview dashboard where it shows the rule fire summary by connector. What this means is that you can see at a glance if a connector has triggered a rule at all and what they are. Of course, being a dashboard, you can drill down and see what is going on.
This is all useful, but what about the underlying messages in an Active Channel? There are plenty available, but one of the more useful ones is the parsing errors channel - Connector Parsing Errors. One of the big issues with SmartConnectors is to identify when errors occur with parsing and what the events are. We can now address this and you can see the messages and details attached to this - ensuring that you trap all situations and make sure that no unparsed messages pass through the system.
Standard Content for SmartConnector Monitoring
The standard content for SmartConnector monitoring isnt as sophisticated or extensive, but its actually really good at giving you real-time information on event rates and overall status of what is occurring. While it isnt necessarily grouped in a single location, go to the Shared/ArcSight Administration/ESM/System Health/Events/Event Throughput dashboard to see an overall view of connectors, EPS rates and current min / max ratings for them. This is a special chart and is real-time.
Next up is the Shared/ArcSight Administration/Connectors/System Health/Connector Connection Status dashboard. This is a pretty simple and straightforward one that shows the current status by the connectors and which ones are caching. This is really useful to understand if you have any issues or problems with the event flow in general.
As you can see the standard content for SmartConnector monitoring is really focused on event flow and what is working / not working. This is very different to that of the Activate Content package for Connector management which digs into a lot more specific stuff around the connectors, their status and parsing processes.
Standard Content for Device Status Monitoring
Some device status monitoring content has appeared over the years and has unfortunately fallen into the realm of unsupported due to lack of community support. However, don't forget that there is a considerable amount of content that is already present and I strongly recommend that you use and abuse what is already there - don't think for a second that there is no content. There is and here it is!
First thing to do for you to turn on device status monitoring at the connector level. I have shown this from the ESM connector management process, but lets just take the Get Status command and see what information you get back:
Run the command and you will get the following - scroll down to the devices section and see what the system is tracking:
Please note that the system tags based on the hostname, IP address and the deviceVendor and deviceProduct in question. This is how decisions are made for multiple source SmartConnectors, but in essence its pretty simple and straightforward. We are tracking the number of events by device and this is what provides us with a tracking capability. Remember, this is done AT THE SMARTCONNECTOR level.
Go to the connector in question and go to the Default tab. Select the Enable Device Status Monitoring field and change it from -1 (default) to the desired number of milliseconds. The consideration here is to think about the time interval BETWEEN messages in which we can consider the log source to have stopped sending. This is where the EPS rates come in useful. Use the tracking status data above to work out approximate EPS rates to the critical devices and then enter the generic setting here - and its milliseconds here - so 60000 is 60 seconds or 1 minute and this is the minimum:
Press apply and then wait for a while - it takes time for the status tables to build at the SmartConnector level and then report it back to ESM. Thankfully there are some dashboards that you can use to monitor things going on though. Take a look under Shared/ArcSight Administration/Devices/All Monitored Devices to see the following dashboard:
Here you can see that things are progressing well and we are tracking a number of devices (remember its a match of hostname / IP as well as deviceVendor and deviceProduct that we use to make the match). Above we can see its running well and no issues. But if I was to stop the replay connector that is sending these events, I will start to trigger the rules. Note, it takes a minimum of 60 seconds / 1 minute to trigger the connector to send the event, but then we assume we wont do anything for 20 minutes (though this can be changed if needed):
Hey presto, as I stop the replay connector, the data stops flowing and inactive devices are being identified and updated. The active list drops and the inactive list increases. How does it drive this? Active List of course, so check out the lists in /Shared/ArcSight Administration/Devices/All Monitored Devices for the complete list of what is being tracked and the relevant data around event numbers and last time we saw an event.
You can then trigger a correlation rule off a critical device or in active for a period of time by referring to the list. However, this is a simple posting to show that we have some content for SmartConnector monitoring as well as log source monitoring as standard. The rest of it can be easily extended as needed - but going back to my original comment though, ArcMC is the way forward and I strongly recommend that you look at this for SmartConnector monitoring and log source monitoring going forward (its simpler and easier).
Yes, there is an overhead. The standard content will use a bunch of rules to populate the active lists and data monitors for the dashboards. While this is fine in a relatively small number of devices environment, it can be quite excessive if you have a large number of devices being fed into ESM.
For example, if you set device status monitoring for say 5 minutes (a good level of inactivity for most devices) then you are going to generate something like 200-300 messages for the status events per device per day. If you have say 30,000 devices then thats nearly 1 million events just in status monitoring! And remember there are a bunch of rule triggers on this too!
In the worst cases that I have seen, heavy device monitoring can cause around a 10% hit on ESM, which is high, but given poor tuning and active rules, it can happen. It is this that has driven the focus around putting this functionality (and more) into ArcMC. The fact that ArcMC uses a different engine for this is probably the best illustration of the reasoning behind this - I know of a couple of customers who have been using ArcMC 2.5 and 2.6 for monitoring 30,000+ devices with ease! Though I would counter that one that 2.5 does have a few performance issues at high device counts, but its fixed in 2.6 which is due in a couple of weeks.
I know it might sound like I am pushing ArcMC here (and I am) but from a scalability, functionality and coverage point of view, ArcMC is the way forward.
See here for some information on what ArcMC provides for connector and device monitoring: