
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
use .* several times in a flexconnector's regex
HI,
in FlexConn_DevGuideConfig.pdf there is such a statement,
.* is not recommended. Never use more than one of these in a regular expression, preferably at the end.
We know that use .* in the middle of a regex will make backtracking happen .
But if we use .* several times in a flexconnector's regex and use this flexconnector to read a large log file(has about 5 million lines in it) ,
what will happen?
Thanks


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
よ しゅん,
My guess is that this is specifically a performance-centric instruction.
I would say it's even more important to be careful using .* when utilizing multi-line regex mode, where . can match newline characters.
Unless you are specifically wanting the .* to cause your regex to backtrack, using the lazy variant .*? to step forward through the string may be more efficient. Another option depending on the situation is using a negated character class (for example, if you had colon delimiters: "[^:]*").
Harold

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi, Harold
Thank you very much.
Infact, we are using flexconnector to read squid log file. But there is a very strange symptom. When we move a large ( have about 2 million lines in it )squid log file into the directory where the flexconnector will read from, the flexconnector can send log records to ESM in a high EPS (about 5000) at first, but after sending several hundred thousands log records, the EPS will suddenly decrease to about 250.
Now we are tring to figure out why this thing happens.
THe connector's log shows the Garbage collector was started repeatly in a very short inteval, and can not release very large number of memory.
Because we used .* sevral times in the regex, we doubt if this is the reason.
your regards


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
I never use .* until i want to end my regex there. so it should be at the end not to traceback always..
if the regex is written poorly then flex will definetely perform poorly w.r.t. performance.. end of the day its a JVM and GC will surely effect its performance.
I wrote a squid syslog subagent which is shared here : https://protect724.arcsight.com/docs/DOC-3194
I wrote this to remove the dependency of sharing log folders; file readers are a kind of headache in comparision to syslog. my squid proxies were running on linux and luckily linux had syslog feature so i built this subagent parser.
if you want use this parser or take help from this parser regular exp.
Cheers !!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi, Khan-sann
Thank you. I will compare yours with ours.
Your regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi, Khan-sann
It seems that flexconnector sends cached log record at a very low EPS.
though there is not live events at the same time.
Your regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
We added the following 2 lines into ….\current\user\agent\agent.properties
transport.loggersecure.multithreaded=true
http.transport.threadcount=6
and all symptom disapeared. The EPS now improved to as high as 19,000.
ref : https://protect724.arcsight.com/docs/DOC-1198#comment-1960
Thanks.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
well then this is a connector performance optimization topic... and we were discussing on flex connector best practices
Cheers !!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Khan-sann
Now 19,000 EPS is big enough for us.
But the throughput of the network is below 1Mbytes/s, so if I have time, I will test the case of
http.transport.threadcount > 6.