Absent Member.
Absent Member.
568 views

use .* several times in a flexconnector's regex

HI,

in FlexConn_DevGuideConfig.pdf there is such a statement, 

   .* is not recommended. Never use more than one of these in a regular expression, preferably at the end.

We know that use .* in the middle of a regex will make backtracking happen .
But if we use .* several times in a flexconnector's regex and use this flexconnector to read a large log file(has about 5 million lines in it) ,
what will happen?

Thanks

Labels (2)
0 Likes
8 Replies
Cadet 2nd Class Cadet 2nd Class
Cadet 2nd Class

よ しゅん,

My guess is that this is specifically a performance-centric instruction.

I would say it's even more important to be careful using .* when utilizing multi-line regex mode, where . can match newline characters.

Unless you are specifically wanting the .* to cause your regex to backtrack, using the lazy variant .*? to step forward through the string may be more efficient. Another option depending on the situation is using a negated character class (for example, if you had colon delimiters: "[^:]*").

Harold


0 Likes
Absent Member.
Absent Member.


Hi, Harold

Thank you very much.

Infact, we are using flexconnector to read squid log file.  But there is a very strange symptom. When we move a large ( have about 2 million lines in it )squid log file into the directory where the flexconnector will read from, the flexconnector can send log records to ESM in a high EPS (about 5000) at first, but after sending several hundred thousands log records, the EPS will suddenly decrease  to about 250.

Now we are tring to figure out why this thing happens.

THe connector's log shows the Garbage collector was started repeatly in a very short inteval, and can not release very large number of memory.

Because we used .*  sevral times in the regex, we doubt if this is the reason.

your regards

0 Likes
Cadet 2nd Class Cadet 2nd Class
Cadet 2nd Class

I never use .* until i want to end my regex there. so it should be at the end not to traceback always..

if the regex is written poorly then flex will definetely perform poorly w.r.t. performance.. end of the day its a JVM and GC will surely effect its performance.

I wrote a squid syslog subagent which is shared here : https://protect724.arcsight.com/docs/DOC-3194

I wrote this to remove the dependency of sharing log folders;  file readers are a kind of headache in comparision to syslog. my squid proxies were running on linux and luckily linux had syslog feature so i built this subagent parser.

if you want use this parser or take help from this parser regular exp.

Cheers !!

0 Likes
Absent Member.
Absent Member.

Hi, Khan-sann

Thank you.  I will compare yours with ours.

Your regards

0 Likes
Absent Member.
Absent Member.

Hi, Khan-sann

It seems that flexconnector sends cached log record at a very low EPS.

though there is not  live events at the same time.

Your regards

0 Likes
Absent Member.
Absent Member.

We added the following 2 lines into   ….\current\user\agent\agent.properties

transport.loggersecure.multithreaded=true

http.transport.threadcount=6

and all symptom disapeared. The EPS now improved to as high as 19,000.

ref : https://protect724.arcsight.com/docs/DOC-1198#comment-1960

Thanks.

0 Likes
Cadet 2nd Class Cadet 2nd Class
Cadet 2nd Class

well then this is a connector performance optimization topic... and we were discussing on flex connector best practices

Cheers !!

0 Likes
Absent Member.
Absent Member.

Hi Khan-sann

Now 19,000 EPS is big enough for us.

But the throughput of the network is below 1Mbytes/s, so if I have time,  I will test the case of

http.transport.threadcount > 6.

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.