Big news! The community will be moving to a new platform April 21. Read more.
Big news! The community will be moving to a new platform April 21. Read more.

SBM ModScript, Part 10 - Regular Expressions

SBM ModScript, Part 10 - Regular Expressions

ModScript has the ability to execute regular expressions on strings. The interface for this is the Regex class. In the following example, we create a regular expression that will match any string that starts with "t" (the default options make this case-insensitive). We will then read a full list of users and fill a Vector with users whose loginid starts with "t". Finally, we iterate our Vector and write the users we found to the output stream.

def AppRecord::GetLogin() {
	return this.GetFieldValue("LOGINID").to_string();
}

var users = Ext.CreateAppRecordList( Ext.TableId("TS_USERS") );
users.Read();

var regex = Regex();
regex.Compile( "^t" );

var out = [];
filter( users, bind( fun( iuser, innerRegex ){ return innerRegex.Matches( user.GetLogin() ); }, _, regex ), back_inserter( out ) );

for_each( out, fun( user ){ Ext.WriteStream( user.GetLogin() ); } );

Step by step:

  • The first thing should look familiar, we talked about adding methods to an existing class in Part 3. In this case, we are making a function that makes it easy to pull the LOGINID out of the User, getting it as a string. 
  • The call to Ext.CreateAppRecordList(), passing in the return value of a call to Ext.TableId(), should be pretty familiar from previous examples. In this case, we are building a list which will let us read rows from the Users table. Then, we call the AppRecordList.Read() method to read the entire Users table (this might not be a great idea on systems with lots of users, but it works well in my simple example). 
  • We create a Regex() and compile it with a simple "starts with t" rule. Since we pass no options into Regex.Compile(), we get the default, which is case-insensitive.
  • We create an output Vector to hold the records that match our regular expression.
  • We invoke the "filter" algorithm. In Part 5 we talked about algorithms, including filter, bind, and back_inserter.
    • filter() - Loops through the range for container "users", invokes the function passed in, for each object where the function returns true, it invokes the second function.
    • users - this is the container to iterate.
    • bind( fun( user, innerRegex ){ return innerRegex.Matches( user.GetLogin() ); }, _, regex ) - bind returns a function for filter to invoke when iterating the users container
      • When the returned function is invoked by the "filter" algorithm, passing in a user from the users container, that value will be passed to the function as the first parameter, which is indicated by the underscore in the call to bind()
      • Also, bind will pass our regex object to the inner function as the second parameter, indicated by the "regex" after the "_" passed to bind.
      • Finally, the inner function will use the regex to indicate to the "filter" call whether this user matches our regular expression.
    • back_inserter( out ) - Adds the matched values to the "out" Vector
      • When the filter function finds a match, it invokes this function, which will append the user object onto the "out" Vector.
  • We assume you want to do something with the filtered list of users. In this case, I invoke the "for_each" algorithm, which will invoke my lambda function on each item in the Vector. In this case, it will print out the matching users' LOGINIDs.

 

Regular Expressions With Groups

Above, we saw a simple regular expression and a simple call to Regex.Matches(). However, we can also use more complex regular expressions, including group capture.

var regex = Regex();
regex.Compile( "(\\d+)(\\w+)" );

regex.Matches( "123abc" );
for ( var i = 0; i < regex.GroupCount(); ++i ) {
	Ext.WriteStream( regex.GroupVal(i) );
}

In this example, we have a regular expression with two groups. First, we expect 1 or more digits, followed by 1 or more word-characters. When we invoke Regex.Match() on a string, the Regex object will remember the groups that it matched, and you can access them via Regex.GroupCount() and Regex.GroupVal(). Regex.GroupVal( 0 ) will always be the text matched by the entire expression. After that, the rest of the groups will be indexed in the order they were captured in the string. The above example gives the output:

123abc
123
abc

 

Regular Expressions: MatchesAgain

After calling Regex.Matches(), you can continue finding matches by invoking Regex.MatchesAgain(). Below, we'll print out each letter in the string, one by one. The regular expression will match any non-digit (\d), non-non-word (\W) character. We do not need grouping parens because the Regex.GroupVal( 0 ) call will always give us the full string that was matched by the regular expression. One thing to note, ChaiScript does not give us a do-while loop, so instead you'll see a while(true)-if-break loop, which is exactly the same paradigm.

var regex = Regex();
regex.Compile( "[^\\d\\W]" );

if ( regex.Matches( "123abc" ) ) {
  while (true) {
	Ext.WriteStream( regex.GroupVal( 0 ) );
	if ( !regex.MatchesAgain() ) { break; }
  } 
}

 

Regular Expressions: ReplaceAll

Finally, the ModScript Regex class has the ability to use regular expression matching to replace values in strings, returning a modified string with all matches replaced. Also, you can use $ notation to use the matched value, or a matching group number, in the replacement: $0 is the entire matched value, $1 would be the first captured group in the match, etc). 

var regex = Regex();
regex.Compile( "[^\\d\\W]" );

Ext.WriteStream( regex.ReplaceAll( "123abc", "(\$0)" ) );

 

Output:

123(a)(b)(c)

What happened? I replaced each matching value, in this case the a, b, and c, with a value of the matched text wrapped with parenthesis. As such, "a" became "(a)", etc. You do not need to use groups in the replacement value, you can replace each letter with "D" if you wish. Keep in mind that if you are not trying to use a dollar group-identifier in the replacement string, you will want to escape the dollar symbol with a double backslash \\.

 

Notes:
  • It is important to remember that most regular expressions have backslashes in them. ChaiScript uses backslash in string literals to identify special characters like newline: \n and tab: \t. As such, all backslashes that are intended for the regular expression need to be double-backslash \\.
  • As the dollar $ symbol is important in regular expressions, it is important to remember that it also means something in ChaiScript. If ChaiScript finds a ${...} in the string, it will try to invoke the value inside the curly braces as if it were string-injected-script. This could be pretty messy if you accidentally mixed it with a regular expression. It is wise to ChaiScript-escape the $ in the string with a single backslash. If you also are trying to regular-expression-escape the dollar, you may need \\\$.
  • The Regex.Compile() function takes an optional second parameter, which is used to indicate options. The default is a case-insensitive, single-line regular expression. To shut off case-insensitive but keep the other options, pass 0. To set the options you like, pass the options to the second parameter, connected with the ChaiScript bitwise-or operator | . 
    • RegexOptionBitsConstants.IGNORECASE
    • RegexOptionBitsConstants.MULTILINE
    • RegexOptionBitsConstants.DOT_MATCHES_ALL

 

SBM ModScript - Table of Contents

Labels (1)

DISCLAIMER:

Some content on Community Tips & Information pages is not officially supported by Micro Focus. Please refer to our Terms of Use for more detail.
Comments
Don: What RE engine does modscript use (i.e. PCRE, Oniguruma, Boost etc). Basically I would like to know the capabilities (non-greedy qualifiers, look-ahead / look-behind, # of captures).
We use PCRE2.
Don: Can you confirm that POSIX character classes are supported? I'm having trouble with:
var regex = Regex() ; var str_Test = string(); str_Test = "256;Demo - User Role;9b2c1f28-d9c2-498d-9a87-51a00e9d5665;"; // Is last char punctuation?? If so, then it's the list delimiter. regex.Compile("[" + ":" + "punct:]\\z") ; // to avoid SerenaCentral converting LBRACKET COLON to a smily Ext.LogInfoMsg("str_Test regex 1 matches=${regex.Matches(str_Test)}");
OTOH, Unicode character properties does seem to work
regex.Compile("\\p{P}\\z") ; Ext.LogInfoMsg("str_Test regex 1 matches=${regex.Matches(str_Test)}");
According to https://www.pcre.org/current/doc/html/pcre2syntax.html and https://www.regular-expressions.info/posixbrackets.html, POSIX Character Classes must be used inside a bracket expression. As such, the above doesn't work, but the fix is this (adding square braockets around the POSIX Character Class): regex.Compile("[[" + ":" + "punct:]]\\z") ;
Top Contributors
Version history
Revision #:
3 of 3
Last update:
‎2020-07-29 16:04
Updated by:
 
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.