HOW TO : New Syslog FlexConnector - step 1
SmartConnectors are the cornerstone of the ArcSight solution and are essential to ensure the consistent processing of the log data. Regardless of what is necessary, we must process the data - either do it early and at collection or do it after the fact and process it once you have stored it, either way we still need to process the data and get it indexed correctly.
FlexConnectors are the component that you would use to process previously unsupported log sources, using the SmartConnector framework. There are many different types of FlexConnector, but in this example, I am going to take a simple scenario with Syslog data only. Firstly, lets take a look at the data as its received:
Here we can see some of the data is correct (the ones with IP addresses parsed out) and there are a bunch of GET messages are are incorrect. If we take a closer look at the message, we can see everything is placed into the Name field and there are no deviceProduct or deviceVendor information:
Some of the data is getting processed, since its coming through a Syslog SmartConnector, but be aware that this isnt perfect and we do want the data parsed out. As you can see in the name field, there is a bunch of data that is relevant:
For example, we can see there is a result code (200) and the total amount of bytes transferred (10177), which, being in this single field makes it difficult for us to process. What we want to do is parse this out. Please note, other messages might appear with a deviceVendor and deviceProduct as Unix/Unix, indicating that its being processed by the generic parser. This will depend on the scenario, and is another clear indicator that its not being parsed correctly.
Firstly, we need to get the data OUT of the system so we can process it. There are a couple of ways to do this, but I will deal with a simple and straightforward way. Firstly, go to the connector that is receiving this data and turn on Preseve Raw Event:
Change the setting to Yes and make sure you press Apply or OK for this change. It will take around 30 seconds for the setting to be applied, and after that you will now see that rawEvent field is now being populated for new data:
If you are in any doubt, you can add the field to an Active Channel by right mouse clicking the columns and selecting Add Column and select the Root section and then select Raw Event from there. It will appear. In the example above, data received by a different connector is NOT sending raw events, which is fine, hence there are some log entries that are not showing up.
Personally, I would filter the Active Channel to only display unparsed events (you should have an Active Channel for this anyway - its good practice). You can adjust your Active Channel to add these conditions - basically, any events that do not have a deviceVendor or deviceProduct. As per this filter:
This will filter the view and only display events that are not being parsed:
Now we have the events we can export them. Just right mouse click in the table and select Export:
Select a suitable destination for this and make sure you have the export data option set to Export for the field set - this means it will export all of the data for the events. Please be aware that the Console will then pull all of the data down and then export it to a CSV file. If you have only a few thousand events then it will take around a minute or so. However, if you have hundreds of thousands, I would recommend changing the filter down and only making it so you have a few thousand events to export. Doing more than say 10,000 will take a considerable amount of time as it will pull ALL of the data in the fields for ALL of the log entries. This takes time.
Once you have the CSV file, you can open it with something like Excel and look at the fields in question:
Go to the rawEvent field and take a look at the entry in question.
Its worth noting that this is the raw syslog data as it is received by the connector and hence this is why there is a header to the message itself. This is the event before it is parsed and hence some of it is missing. Please note that Syslog messages are made up of the following:
Facility | TimeStamp | Source address or host | message
In this case you can see the facility is <128> and there is a time stamp followed by the source IP (yes, that is correct for my test network). In the case of the Syslog SmartConnector / FlexConnector framework, it will automatically parse this data out for us, so we don't need to worry about this. BUT, we do need to process the data after it, so the GET and the message itself. So copy the cells and then paste them into your text editor of choice - please use something decent like Notepad++ or similar though:
Now we have the raw event data, but we want to remove the <facility> part. The timestamp can stay, as we will auto process this. Just do a search and replace and search for the <128> and replace it with nothing. You should end up with the following:
Save the file and remember where you have it. Now load up the FlexConnector framework. You can do this from ANY SmartConnector, but I am doing this from Windows so I can show you easily what is going on. FlexConnectors created on Linux or Windows work on other platforms, so don't worry about this. Go to the location where you have installed a SmartConnector and run the following:
When the tool loads, select New Flexagent and give it a name:
Then select load log file and select the file you saved before without the facility data in it:
Since this is a Syslog parser, we want to select the Treat As Syslog Subagent and make sure it is ticked:
Next click the forward arrow in the toolbar at the top and advance to the second message. You will see above that it has the date and time in the message, but as you move to the second one, it will now process it automatically. So you should see the following:
The regex that we will use to process this is set to match all - this is .* - but thats not going to help us yet. We need to parse this out. So we will step through this one part at a time:
For the first part of the message, we want to parse out the type of the call - in this case its the GET part. We can use the \S+ regex operator for this. This means that it will match all non-whitespace characters EXCEPT space - hence the delimiter here. So we use the + to make it one or more. This is then \S+ for one or more non-whitespace characters and stop when you get to a space! Simple, and to put it into a token for the logs, we put it in parentheses - so we end up with (\S+) - and the system will show that it is matching it with a highlighted yellow background.
Same again for the above message - we want to parse out the URL that is being used. So again, we use another (\S+) to do this processing.
And again to complete the full set. Once you have done this, the full message is highlighted yellow and you can see the full line is processed. Now Select File and Save so that we can re-process the data. You will see the tokens appear in the table below - this is what we will do next:
While you can use the default names, it makes a lot of sense to name them at this point. I am giving them sensible ones. Please note, do not call them the actual field names from the schema, this will cause errors! Be careful.
Next, we can select the fields and drag and drop them into the field mapping part. Simply drag them over and select the field you want to select. You should have some idea of what to map them to. For further information do check out the FlexConnector guide here -
We can be pretty clever here and simply drag and drop the fields that we know are OK - just like in programming languages, we need to be cautious of the type of the field, as we cant put a string in an integer or an string in an IP Address field! So be sensible though it will throw errors if you try to do this. We know these fields are string based, so we are OK. But obviously the bytes field (in the example above its 2986) is going to be a number - so we need to process this carefully.
At the bottom of the window you will see the text settings that are being created as we do things - this is creating the properties file we need. You can manually edit this for advanced operations and you can see some fields we want to process here. Since we want to put the bytes token into the bytesIn field in the schema, we can use a processor for this:
I am using the __safeToInteger operator here - it is documented in the FlexConnector manual so please check them out there. This is a neat little trick here though. If you are in any doubt where a field might be something else and hence throw an error, its safer to use these __safeTo operators. There are the following ones:
And there are also the _oneOf operators too. Basically these prevent you from trying to do an operator on a piece of data that is incorrect and should it not operate correctly, it will simply assign the field with NULL - and do so in a safe manner. Try to use these where you can! Now we can see where we are:
We also must give it a deviceVendor and deviceProduct so we can process that too, but this is fixed so we can use the __stringConstant operator to do this. See the almost finished parser here:
In my example, I also added something descriptive to the name field too - but you will see this when we run the connector. Make sure you save the parser and then locate where you saved it. Copy the file you have - which will be called <something>.sdkrfilereader.properties. And then copy it to the correct folder on the syslog parser itself. I am running the syslog SmartConnector on my Windows workstation, so you can see the folder here:
You should drop the file in the [ARCSIGHT HOME]/current/user/agent/flexagent/syslog folder. And when you save it, change the name to reflect the one above - you should have something like <yourname>.subagent.sdkrfilereader.properties
Once you have done this, restart the SmartConnector and check in the agent.log file for the following entry:
You will see an entry showing that it has loaded the parser and it is ready to run. If you do not get this message at start up, there is an issue and it is not reading it. Normally because you have copied the file to the wrong location or that you have the incorrect file name - both are critical.
Once you have this correct, you can go back to view the messages in an Active Channel and now see them parsed:
And when we look at the event in detail we see the following:
So we now have parsed data going in and the Syslog SmartConnector has a FlexConnector added to it. Please note the following additional points:
- We are currently missing categorization - I will deal with this at a later point
- We don't have what is called a deviceEventClassId assigned which will cause issues when we attempt to add any categorization
- The message isnt particularly meaningful and we will want to improve this
- We don't have a sourceAddress applied for the data - it is in the deviceAddress which is usually for the referring source
- We haven't applied any priority to this as it has simply selected the default priority of 2
We will deal with these later, but the important part is that we now have processed data and fixed the incorrect parsing.
More to come!
Its a per connector setting that you need to do. I covered this at the start of the article.
You need to go to the connector in question and turn on 'preserve raw event' option for that connector. This will then take the raw event and place it in the rawEvent field that you can then take a closer look at and understand what is going on. This is relevant to most log sources, but be aware that if you do this for a DB query connector, the data is structured anyway, so there is no raw event! But since this is focused on syslog, you should get this. I recommend setting 'preserve raw event' and go from there.
Great article! I also wondered how to deal with the "facility" and where the header ended and the actual message started. Can't wait to try my next flex!
In the directory C:\arcsight\windows\current\bin I dont have tool arcsight regex.
There are arcsight, checkIgnoreJar, reutil and runagentsetup.(attachment) That is all.
Its a command line option. Just run it - you call the arcsight.bat file and then pass it the command of regex. Its NOT a program that you call. Just run it from the bin folder
Great Information.. hats off!! @pbrettle
Im facing parsing issue with F5 Big IP (version 12.1.2).
Name Field showing as: Unsupported Event and getting all information in Message Field. How can i sort out this issue?Need your valubale suggestions...
SAMPLE RAW LOGS:
<141>Jul 30 16:06:14 slot1/HQ-prod-dmz-01 notice apmd: 01490005:5: /Common/Exchange_2010.app/exch_access:Common:cb0d5d07: Following rule 'fallback' from item 'SSO Credential Mapping' to ending 'Allow'
<141>Jul 30 16:06:14 slot1/HQ-prod-dmz-01 notice apmd: 01490102:5: /Common/Exchange_2010.app/exch_access:Common:cb0d5d07: Access policy result: LTM+APM_Mode