Match special characters with ControlPoint Eduction

1 Likes
over 1 year ago

ISSUE:

With ControlPoint, you have the possibility to match patterns as a regulation expression. For example,
(([\ ]90)|([0][0-9]?)|(\ \(90\))|([\ ]([ ]?)\([0-9]{2}\))|(\(\ 90\))|([90]*))([ ]?)((\([0-9]{3}\))|([0-9]{3}))([ ]?)([0-9]{3})(\s*[\-]?)([0-9]{2})(\s*[\-]?)([0-9]{2}) is an example of regular expression to match Turkish phone numbers in documents.

The thing is that there is a “ ” sign in this RegEx. By default, ControlPoint pattern matching engine, as known as Eduction, excludes these special characters.

HOW TO STEPS:

If you want to match them, you need to add a parameter in the section describing your “repository” in the Connector Framework Service (CFS) configuration file.

Edit for example the Indexer/FileSystem Connector Framework/ControlPointFileSystem Connector Framwork.cfg file if you have created “MyRepository” as a FileSystem source and add
TangibleCharacter=  to the parameters section of the repository.

[MyRepositoryEductionSettings]
MaxMatchesPerDoc=10000
TangibleCharacter=
SearchFields=DRECONTENT
Entity0=number/phone_tk
EntityField0=CPED_NUMBER_PHONE_TK
ResourceFiles=eduction\number_phone_tk.ecr

Tags:

Labels:

How To-Best Practice
Comment List
Anonymous
Related Discussions
Recommended