Exclude a file from scan

1 Likes

ISSUE:

ControlPoint allows to exclude files when scanning sources. The graphical interface gives the possibility to exclude file extensions. What if I want to exclude a specific file.

Annotation 2020-04-30 141928.png

HOW TO STEPS:

The connector configuration file gives you more granularity in what you can exclude.

Open the connector configuration file (.cfg).

Go to the task section of the repository: e.g. [TaskMyRepository]

Add the PathCantHaveRegex parameter. In the following example, we exclude "thumb.db" files AND all pdf files.

PathCantHaveRegex0=.*thumb\.db
PathCantHaveRegex1=.*\.(pdf)

Note that you can have multiple parameters with the number at the end.

Also note that when you use these manual adds in the configuration file, you should not use anymore the field in the graphical interface as this will conflict.

See also the file system connector documentation: https://www.microfocus.com/documentation/idol/IDOL_12_3/FileSystemConnector_12.3_Documentation/Help/index.html#Configuration/TaskName/PathCantHaveRegex.htm

Labels:

How To-Best Practice
Comment List
  • Thanks Colin, Will try it today by taking the backup and we are using 5.8 ControlPoint only.

     

    Step1 is correct: Drop Database and run exportBI_database_creation.sql [This will create the Database, Tables and Views etc. ]

    Step2: Run the powershell "Initial_export.ps1" by passing the parameters such as dbServer storagePath [sqlUser] [sqlPwd]

    Step3: Verify the Data populated and filled the 2 views called ControlPoint.Documents and ControlPoint.Policies.

    Step4: Either manual run SP Exec [dbo].[RefreshPublishedDocuments] or schedule Job and it can refresh the Data from meta Store every day.

  • Hi,

    There was an issue in CP 5.7 where it was not possible to run db_bcp_initial_export_import.sql after ControlPointExport was deleted and recreated.

    This was resolved in CP 5.8.

    So if you are using CP 5.8 or above then it should be possible to
    Delete ControlPointExport DB and run exportBI_database_creation.sql to create ControlPointExport DB again.
    Edit db_bcp_initial_export_import.sql with datapath, username, password values etc
    Then run db_bcp_initial_export_import.sql
    etc

    Regards,
    Colin

  • Hi Colin,

     

    We can remove the "ControlPointExport" Database anytime and rerun the script to create and to import data using powershell  and if need refreshDatabase SP to sync up to date right ?

    I am getting some error while running the SP "RefreshPublishedDocuments' and not sure whether worth to investigate or rerun and create to see the behaviour. ExportDPCP Error.JPG

    Note: When I checked "[dbo].[UpdateLogExport] "order by logTimeStamp found the error as "Error 547 Inserting new records in documentPolicy. Aborting Execution". Policies --> Table and View records not matching and view contains many duplicate records as well.

    select count(*) from [ControlPointExport].[ControlPoint].[Policies]  with (nolock) --23165
    select COUNT(DISTINCT UUID) from [ControlPoint].[Policies] with (nolock)--1461

    select count(*) from [dbo].[DocumentPolicy]  with (nolock) --1461

  • Thanks Colin,  This is one of the task pending and completed now to export control point data and Refresh the ControlPointExport database. Based on quick check still the documents disposed not exists in the export database. But, will explore further to understand this database and how to capture the potential disk saving. 

  • Hi,

    As part of the BI Analysis functionality shipped with ControlPoint there is a RefreshPublishedDocuments stored procedure you can run.
    This ensures the ControlPointExport database contains the most recent data to reflect repository re-scans or policy assignments for your analysis.

    I am not aware of any other methods.

    Regards,
    Colin

  • Thanks keith, Noted. This will be fine when it comes Read-only Legacy file share. But, it is difficult to capture for Active File Share and if we scheduled to run as recurrence scanning and dispose in place. Unless, I can schedule one report [ SSRS \ Power BI ] to calculate space saving before CP Schedule scanning time. For ex: Every 2 P.M CP Scan repository schedule and I will schedule my SSRS report schedule to run and calculate the space at 1.30 P.M.

    Does any other option or any one doing in different way.? Thanks in advance.

  • To calculate space saving run the report/sql query before you execute the dispose policy or after policy execution but before you run another scan on the repository.

     

    Regards,

    Keith

  • Thanks Keith, It helps and understand the behaviour.  So, if we want to get the "potential space saving" as part of this disposal, we have to run the Statistics report or SQL Query to calculate the space before delete via policy?

    The reason because, [AuditDocumentEventArchive] & [AuditDocumentEvent] maintains the records deleted , but it does not tell the Size or file extension like "MetaData" table. Even, all the Audit reports telling the file names, reference, policy and category, but not th size and space saved once repository status changed "Managed" and all the documents deleted.

    It is also expected behaviour or any place we can check the potential space saving after dispose all the documents ?

  • Hi,

    DocumentStatus 16 within the [ControlPointMetaStore].[Metadata].[Document] table is for files that have been disposed via ControlPoint policy. Once an Incremental scan is executed, the connector will identify that the files no longer exist in the source repository and send Delete actions to MetaStore which will remove those documents from the [ControlPointMetaStore].[Metadata].[Document] table. What you are seeing is expected behaviour as we don't keep references of deleted files in the Document table once re-scanned.

     

    If you have any further issues on this please raise a support ticket and I will be happy to assist.

     

    Regards,

    Keith

  • Any update? Will put in Simplified term to understand the controlpoint behaviour:

    1) I ran the policy and dispose all documents as expected from Legacy File Share. I ran the report to confirm and it is fine.

    2) I verified the table to confirm the DocumentStatus changed to 16 and it was fine.

    SELECT count(*) as "Documents Deleted"
    FROM "Metadata"."Document" with (nolock) WHERE "RepositoryId" = XX
    AND NOT "LDCMD5" IS NULL AND "DocumentStatus" = 16

    3) Only one lnk file left over after policy execution and I understand that due to settings as "as "url and lnk" and these files excluding from the policy execution.

    4) I clicked the lnk file directly from the legacy share, it is removed. But, Repository under managed [ XX ] still shows 1 items. [Note: Legacy share it is empty now] . I ran the Incremental scan the repository and no effect and still shows 1 items. I ran full the full scan and the it shows now 0 items. 

    5) All the records from MetaData table deleted belongs to that repository[XX]. But, I can see those entries now into [ControlPointMetadata].[AuditDocumentEventArchive] && [ControlPointMetadata].[AuditDocumentEvent].

    Question:

    Is this expected behaviour or anything missing here? I thought , CP keep track the meta Data entries in "Metadata.Document" as well for our reference with the flag 16.

    Thanks  inadvance

Related
Recommended