use .* several times in a flexconnector's regex
in FlexConn_DevGuideConfig.pdf there is such a statement,
.* is not recommended. Never use more than one of these in a regular expression, preferably at the end.
We know that use .* in the middle of a regex will make backtracking happen .
But if we use .* several times in a flexconnector's regex and use this flexconnector to read a large log file(has about 5 million lines in it) ,
what will happen?
My guess is that this is specifically a performance-centric instruction.
I would say it's even more important to be careful using .* when utilizing multi-line regex mode, where . can match newline characters.
Unless you are specifically wanting the .* to cause your regex to backtrack, using the lazy variant .*? to step forward through the string may be more efficient. Another option depending on the situation is using a negated character class (for example, if you had colon delimiters: "[^:]*").
Thank you very much.
Infact, we are using flexconnector to read squid log file. But there is a very strange symptom. When we move a large ( have about 2 million lines in it )squid log file into the directory where the flexconnector will read from, the flexconnector can send log records to ESM in a high EPS (about 5000) at first, but after sending several hundred thousands log records, the EPS will suddenly decrease to about 250.
Now we are tring to figure out why this thing happens.
THe connector's log shows the Garbage collector was started repeatly in a very short inteval, and can not release very large number of memory.
Because we used .* sevral times in the regex, we doubt if this is the reason.
I never use .* until i want to end my regex there. so it should be at the end not to traceback always..
if the regex is written poorly then flex will definetely perform poorly w.r.t. performance.. end of the day its a JVM and GC will surely effect its performance.
I wrote a squid syslog subagent which is shared here : https://protect724.arcsight.com/docs/DOC-3194
I wrote this to remove the dependency of sharing log folders; file readers are a kind of headache in comparision to syslog. my squid proxies were running on linux and luckily linux had syslog feature so i built this subagent parser.
if you want use this parser or take help from this parser regular exp.
We added the following 2 lines into ….\current\user\agent\agent.properties
and all symptom disapeared. The EPS now improved to as high as 19,000.
Now 19,000 EPS is big enough for us.
But the throughput of the network is below 1Mbytes/s, so if I have time, I will test the case of
http.transport.threadcount > 6.