New Ranks & Badges For The Community!
Notice something different? The ranks and associated badges have gone "Star Fleet". See what they all mean HERE
Highlighted
Absent Member.
Absent Member.
2171 views

Problem with Regular Expression and accents

Jump to solution

Hello,

I am trying to use regular expressions to capture the id of a tag. Example, i want the id of the tag wich contains the value Microfocus:

 

<option id="35930122">Google</option><option id="223058b5" atribute="verificação">Microfocus</option><option id="e1f9587e">Borland</option>

 

and the regular expression created: id="([^"]*)"[^>]*>Microfocus<

On Silk Performer, it would be: id=\"\([^\"]*\)\"[^>]*>Microfocus<

 

When i run this expression on online regex validators it works, but it fails on silk performer. I suppose that it is a problem with non regular ascii characters validation in a [set] of chars (in my case, this set -> [^>]* <-). If i run this test with "verificacao" instead of "verificação" it works. Is this a problem of regular expression evaluation on Silk Performer or am i doing something wrong? Can you help me?

Tags (1)
0 Likes
2 Solutions

Accepted Solutions
Highlighted
Micro Focus Expert
Micro Focus Expert

Hi,

can you please use function FromEncoding on the source before putting it into the regular expression function.

The following example works for me:

    SetEncoding("UTF-8");
    sSource := FromEncoding("<option id=\"35930122\">Google</option><option id=\"223058b5\" atribute=\"verificação\">Microfocus</option><option id=\"e1f9587e\">Borland</option>");
    sRegex := "id=\"\([^\"]*\)\"[^>]*>Microfocus<";
    StrRegexCompile(hRegex, sRegex);
    match := StrRegexExecute(hRegex, sSource);
    if match then
      StrRegexSubst(hRegex, "\1", sTarget, STRING_COMPLETE);
      print("match: "+sTarget); ;
    else
      print("no match"); ;
    end;

Cheers,
Thomas

View solution in original post

0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

Hi again,

attached is RegularExpressionsJF.ltz 4555.RegularExpressionsJF.zip (actually .zip extension, just rename it to .ltz and I don't know why it got that number in front of, but I guess it should not be a problem).

There are no real limitations, but you need to have a Java jdk installed and configred in Silk Performer's Java Settings and there's a small overhead as the Silk Performer runtime needs to load the Java runtime to execute the Java code.

And let's consider the problem as "limitation" - and well, as I am part of the Silk Performer team, you have already informed us, and I'll go and enter a ticket in our tracking system [;)].

Cheers,
Thomas

View solution in original post

12 Replies
Highlighted
I've found Silk Performer's ability to parse regular expressions are extremely limited: documentation.microfocus.com/.../index.jsp

If you absolutely need regular expression matching, you may want to capture an entire page a string and pass it to a Java applet for the heavy lifting.
Highlighted
Absent Member.
Absent Member.
Thanks David. I just wanted to let the creators of the regex functionality know about this strange behavior. Is there a way to file a bug or something like that to Silk Performer?
0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

Hi,

can you please use function FromEncoding on the source before putting it into the regular expression function.

The following example works for me:

    SetEncoding("UTF-8");
    sSource := FromEncoding("<option id=\"35930122\">Google</option><option id=\"223058b5\" atribute=\"verificação\">Microfocus</option><option id=\"e1f9587e\">Borland</option>");
    sRegex := "id=\"\([^\"]*\)\"[^>]*>Microfocus<";
    StrRegexCompile(hRegex, sRegex);
    match := StrRegexExecute(hRegex, sSource);
    if match then
      StrRegexSubst(hRegex, "\1", sTarget, STRING_COMPLETE);
      print("match: "+sTarget); ;
    else
      print("no match"); ;
    end;

Cheers,
Thomas

View solution in original post

0 Likes
Highlighted
Absent Member.
Absent Member.
Thank you a lot Thomas! It solved my problem. I can't believe i did such a basic mistake.. Deleting java code 😄
0 Likes
Highlighted
Absent Member.
Absent Member.
Thomas, i was analyzing the result in others use cases, and i think the answer is still not the one i am looking for. Let me explan: consider that the id is "verificação" instead of "223058b5". When you do FromEncoding, it is converting all accentuated characters to '?'. It will work, but the value is modified in a way i can't return it to the original value (or i don't know how). Other problem is when the accentuated value is important for the evaluation process. Since the char was changed to '?', if the value is important to the pattern, it will be ignored. Example: change from Microfocus to Microfócus on sSource and sRegex.
0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

Yeah, you are right, there are some limitations in place.

Have you ever tried to use our Java Framework and play around with java.util.regex.*? That would be maybe a workaround for you.

If you would like to tryout I could attach my demo project.

It would also be possible to put the functionality in a custom dll and reuse it within the script.

Cheers,
Thomas

 

 

 

0 Likes
Highlighted
Absent Member.
Absent Member.
Yes Thomas, please attach it. Are there any limitations when using java code instead of plain bdf?

Can i consider this problem with bdf's regex a bug? If so, how can i inform the silk performer team?
0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

Hi again,

attached is RegularExpressionsJF.ltz 4555.RegularExpressionsJF.zip (actually .zip extension, just rename it to .ltz and I don't know why it got that number in front of, but I guess it should not be a problem).

There are no real limitations, but you need to have a Java jdk installed and configred in Silk Performer's Java Settings and there's a small overhead as the Silk Performer runtime needs to load the Java runtime to execute the Java code.

And let's consider the problem as "limitation" - and well, as I am part of the Silk Performer team, you have already informed us, and I'll go and enter a ticket in our tracking system [;)].

Cheers,
Thomas

View solution in original post

Highlighted
Absent Member.
Absent Member.
I think this sums it up.

Thank you Thomas,
Pedro Barros
0 Likes
Highlighted
Thanks kue, I look forward to looking over your demo project!
0 Likes
Highlighted
Absent Member.
Absent Member.

This may off topic but related to regular express. Is there easy way to support regex in BrowserWaitForProperty()?

like BrowserWaitForProperty("//div[@textContents='regex']", "text", "regex", 10000);

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.