Regex File Flex Connector for JSON events

Hello, I have a problem.

I'm using the Regex File Flex Connector to process JSON events.

The configuration file is written correctly, the regular expression has been tested in the relevant services.

And at the output, in the agent.log file for my connector, I see an error:

[2024-06-20 11:32:02,720][WARN ][com.arcsight.agent.sdk.d.u] [parseTokensNow]
Message [{] did not match the common regular expression [\{\s*"timestamp":\s*
"([^"]+)",\s*"event_type":\s*"([^"]+)",\s*"source_ip":\s*"([^"]+)",\s*"
destination_ip":\s*"([^"]+)",\s*"user_agent":\s*"([^"]+)",\s*"status_code"
:\s*(\d+),\s*"request_method":\s*"([^"]+)",\s*"request_url":\s*"([^"]+)",
\s*"response_size":\s*(\d+)\s*\}], ignoring...

My expression is \\{\\s*"timestamp":\\s*"([^"]+)",\\s*"event_type":\\s*"([^"]+)", \\s*"source_ip":\\s*"([^"]+),\\s*"destination_ip":\\s*"([^"]+),\\s*"user_agent ":\\s*"([^"]+),\\s*"status_code":\\s*(\\d+),\\s*"request_method":\\s*"([^ "]+),\\s*"request_url":\\s*"([^"]+),\\s*"response_size":\\s*(\\d+)\\s*\ \}

Example event

{
"timestamp": "2024-06-19T10:15:30Z",
"event_type": "access_log",
"source_ip": "192.168.1.1",
"destination_ip": "10.0.0.1",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"status_code": 200,
"request_method": "GET",
"request_url": "/api/data",
"response_size": 1024
}

P.S. a similar approach works perfectly for CEF format

What is my mistake ? Maybe I should choose another connector ?

Thanks in advance

Bohdan
  • Verified Answer

    Is this using a syslog connector? If it's not syslog, I would suggest using the JSON file parser instead of regex.  

    If it is syslog, it's possible the events aren't coming in as a single line and not matching your regex.  I would suggest to set preserve raw events to true so you can verify if you're getting the full JSON event or not.  

    If you are getting the full JSON event, an alternative to fully writing it in regex is to write the first part in regex, then pass the entire event to an extraprocessor and utilize a JSON file parser to properly parse it.  So you could start with something like \\(\\s+"timestamp.*  and then have the full token sent to the JSON parser to do the rest of the work.