CP to Export list of items with metadata in a CSV

2 Likes
over 1 year ago

ControlPoint allows the data analysts to export a list of items in a CSV file with the desired associated metadata.

Go to the report, as known as category, in the list of items. Select those you want to export in the list.

image.png

Go to ACTIONS and select EXPORT.

image.png

You get a CSV file with all items selected.

image.png

To customize the columns, go to ADMINISTRATION >>> SETTINGS >>> GENERAL >>> FIELDS >>> ITEMS PROPERTIES.

image.png

You can add or remove fields that are exported in the CSV file.

 

Labels:

How To-Best Practice
Comment List
Anonymous
  • Thanks Keith, Will wait for Colin to pick up my request that I raised. BTW, I already tried MATCH ON SUBITEMS = YES , and the result is completely weird and more than what I expect as Non-zero bytes files coming.

     

  • Tarik is correct and any further investigation should be carried out in the support ticket.

    In answer to question though for any followers on here see below:

     

    If you want to allow Category to match the results that are Subitems then within the Category settings, click on the Options tab and set:

    MATCH ON SUBITEMS = YES

     

    When you go back into the Results tab, you should now see the 0 byte sub item file.

  • Thank you.

    He will definitely do so once he comes online.

     

     

  • Thanks Tbaki, The ticket has been raised. It will be good if Colin can pickup this request as it will easy continue to discuss with me and address. [ SD02699630 ]

  • This has now gone into full blown analysis and requires a ticket.

    Colin will only be able to continue if a ticket is created.

     

    Please create one and let us know the number.

    Thanks

    Tarik

  • Hi Colin,

    This helps to identify what is happening behind the scene. CP building dynamic query by fetching the result and insert into the temp table [ #Dataset] and process the results by join query with [ControlPointMetadata"."Additional", MetaData.Document, Metadata"."LocationHash" based on different conditions during the execution.

     

    1) I tried SQL Profiler and captured the same query as below. [ Only the repository ID is different in my case as you can see it is "24"

    SELECT TOP 25 ROW_NUMBER() OVER(ORDER BY ":SortExpr:0", ":SortExpr:1") AS ":Hit", ":RepositoryId", ":DocKey:0", "DREREFERENCE", "AU_DOCUMENT_FILESIZE_BYTES", "CPROOTPARENTREFERENCE" INTO "#DataSet" FROM (SELECT ROW_NUMBER() OVER(PARTITION BY t1."MatchArchive" ORDER BY t1."MatchArchive") AS ":CombineNumber", t0."RepositoryId" AS ":RepositoryId", t0."DocKey" AS ":DocKey:0", t0."RepositoryId" AS ":SortExpr:0", t0."DocKey" AS ":SortExpr:1", t0."DocKey" AS "DREREFERENCE", t0."FileSize" AS "AU_DOCUMENT_FILESIZE_BYTES", t1."MatchArchive" AS "CPROOTPARENTREFERENCE" FROM "Metadata"."Document" t0 LEFT OUTER JOIN "ControlPointMetadata"."Additional" t1 ON t0."RepositoryId" = t1."RepositoryId" AND t0."DocKey" = t1."DocKey" WHERE t0."RepositoryId" = 14 AND t0."FileSize" = 0) tbl WHERE ":CombineNumber" = 1
    ORDER BY ":Hit";


    SELECT t0.":Hit", 0 AS ":FieldSet", t0.":RepositoryId", t1."Value" AS "DREREFERENCE", t2."DocumentName" AS "DRETITLE", t2."FileExtension" AS "IMPORTMAGICEXTENSION", t2."DocumentStatus" AS "CP_DOCUMENT_STATUS", t3."Value" AS "DREROOTPARENTREFERENCE", t2."OriginalFileExtension" AS "ORIGINALEXTENSION", t0."AU_DOCUMENT_FILESIZE_BYTES", t2."DisplayName" AS "AU_REPOSITORY_TITLE_STRING", t2."AutnIdentifierValue" AS "AUTN_IDENTIFIER", t4."Value" AS "CPLOCATION", t5."Value" AS "CPROOTFAMILYREFERENCE", t7."Value" AS "CPROOTPARENTREFERENCE", t2."DateLastModified" AS "AU_REPOSITORY_MODIFIEDDATE_EPOCHSECONDS", NULL AS "CP_HOLD" FROM "#DataSet" t0 INNER JOIN "Metadata"."ReferenceHash" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0."DREREFERENCE" = t1."HashKey" INNER JOIN "Metadata"."Document" t2 ON t0.":RepositoryId" = t2."RepositoryId" AND t0.":DocKey:0" = t2."DocKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t3 ON t0.":RepositoryId" = t3."RepositoryId" AND t2."RootParentReferenceHash" = t3."HashKey" LEFT OUTER JOIN "Metadata"."LocationHash" t4 ON t0.":RepositoryId" = t4."RepositoryId" AND t2."LocationHash" = t4."HashKey" LEFT OUTER JOIN "ControlPointMetadata"."Additional" t6 ON t0.":RepositoryId" = t6."RepositoryId" AND t0.":DocKey:0" = t6."DocKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t5 ON t0.":RepositoryId" = t5."RepositoryId" AND t6."MatchWithinArchive" = t5."HashKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t7 ON t0.":RepositoryId" = t7."RepositoryId" AND t0."CPROOTPARENTREFERENCE" = t7."HashKey" UNION ALL
    SELECT t0.":Hit", 1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, t1."PolicyId" FROM "#DataSet" t0 INNER JOIN "ControlPointMetadata"."DocumentPolicyHold" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0.":DocKey:0" = t1."DocKey"
    ORDER BY ":Hit", ":FieldSet"

     

    Step2) I ran the above query and can get the exact 2 rows with zero bytes. 1 row is document with zero bytes and other row is one PST File. When I compare the query output, realized inside the PST File , there is one zero byte file which is correct and  showing in the query results. But , during category result, it is displaying the parentdocument [DREROOTPARENTREFERENCE ] instead of the child [ DREREFERENCE ] . In my case instead of showing the zero byte file inside the PST, it is showing the pst file itself in the category result which is wrong. Because this pst size is 100 MB.

     

     

    :RepositoryIdDREREFERENCEDRETITLEAU_DOCUMENT_FILESIZE_BYTESDREROOTPARENTREFERENCE24\\XXXXXX\FileShare\ScanTest\addresspoints_line.MIDaddresspoints_line.MID0NULL24\\XXXXXX\FileShare\ScanTest\Sample File-CP\XXX-Way Tim.pst:/Top of Personal Folders/FW Count_XXXX.mail:base_PM_version_1_7 2001 for updating zone system XXX Check.zip:base_PM_version_1_7 2001 for updating zone system XXX Check/leftleft0\\XXXX\FileShare\ScanTest\Sample File-CP\XXX-Way Tim.pst

    3) Have someone tried this scenario where 0 byte files inside Zip or inside pst and Analysis including "Analyse Sub Items - Yes" and when you run the category the results are showing not correct, even though the query output will show as zero bytes only.

    4) Anything to do with Fine-Tuning or options in the Category or any way to display the result of the "DREREFERENCE" in the category result [instead of DREROOTPARENTREFERENCE]which I believe is the solution to the problem. Unless this category will work only on the documents without sub items?. Pls advice.

     

  • Check in this table that the Filesize column is 0 for your file:

    select * from [ControlPointMetaStore].[Metadata].[Document] WITH (nolock)
    where DocumentName = 'zero k file.txt'

    replace 'zero k file.txt' with the name of file you are wishing to check in metastore


    Also you can use SQL profiler shipped with SQL Server to see the underlying query issued to SQL.

    Within SQL profiler Select the following event to capture:
    TSQL: SQL:BatchCompleted

    Click Show all columns

    Then select Column Filter and Add DatabaseName Like ControlPointMetaStore

    This limits the traffic to just those calls to ControlPointMetaStore database.

    Click Run and then capture yourself clicking the category to display the docs in the listview.

    Then stop the trace.

    You will see a line in the resulting SQL trace that contains 2 lines like below:

    SELECT TOP 25 ROW_NUMBER() OVER(ORDER BY ":SortExpr:0", ":SortExpr:1") AS ":Hit", ":RepositoryId", ":DocKey:0", "DREREFERENCE", "AU_DOCUMENT_FILESIZE_BYTES", "CPROOTPARENTREFERENCE" INTO "#DataSet" FROM (SELECT ROW_NUMBER() OVER(PARTITION BY t1."MatchArchive" ORDER BY t1."MatchArchive") AS ":CombineNumber", t0."RepositoryId" AS ":RepositoryId", t0."DocKey" AS ":DocKey:0", t0."RepositoryId" AS ":SortExpr:0", t0."DocKey" AS ":SortExpr:1", t0."DocKey" AS "DREREFERENCE", t0."FileSize" AS "AU_DOCUMENT_FILESIZE_BYTES", t1."MatchArchive" AS "CPROOTPARENTREFERENCE" FROM "Metadata"."Document" t0 LEFT OUTER JOIN "ControlPointMetadata"."Additional" t1 ON t0."RepositoryId" = t1."RepositoryId" AND t0."DocKey" = t1."DocKey" WHERE t0."RepositoryId" = 1 AND t0."FileSize" = 0) tbl WHERE ":CombineNumber" = 1
    ORDER BY ":Hit";

    SELECT t0.":Hit", 0 AS ":FieldSet", t0.":RepositoryId", t1."Value" AS "DREREFERENCE", t2."DocumentName" AS "DRETITLE", t2."FileExtension" AS "IMPORTMAGICEXTENSION", t2."DocumentStatus" AS "CP_DOCUMENT_STATUS", t3."Value" AS "DREROOTPARENTREFERENCE", t2."OriginalFileExtension" AS "ORIGINALEXTENSION", t0."AU_DOCUMENT_FILESIZE_BYTES", t2."DisplayName" AS "AU_REPOSITORY_TITLE_STRING", t2."AutnIdentifierValue" AS "AUTN_IDENTIFIER", t4."Value" AS "CPLOCATION", t5."Value" AS "CPROOTFAMILYREFERENCE", t7."Value" AS "CPROOTPARENTREFERENCE", t2."DateLastModified" AS "AU_REPOSITORY_MODIFIEDDATE_EPOCHSECONDS", NULL AS "CP_HOLD", NULL AS "CP_POLICYID" FROM "#DataSet" t0 INNER JOIN "Metadata"."ReferenceHash" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0."DREREFERENCE" = t1."HashKey" INNER JOIN "Metadata"."Document" t2 ON t0.":RepositoryId" = t2."RepositoryId" AND t0.":DocKey:0" = t2."DocKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t3 ON t0.":RepositoryId" = t3."RepositoryId" AND t2."RootParentReferenceHash" = t3."HashKey" LEFT OUTER JOIN "Metadata"."LocationHash" t4 ON t0.":RepositoryId" = t4."RepositoryId" AND t2."LocationHash" = t4."HashKey" LEFT OUTER JOIN "ControlPointMetadata"."Additional" t6 ON t0.":RepositoryId" = t6."RepositoryId" AND t0.":DocKey:0" = t6."DocKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t5 ON t0.":RepositoryId" = t5."RepositoryId" AND t6."MatchWithinArchive" = t5."HashKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t7 ON t0.":RepositoryId" = t7."RepositoryId" AND t0."CPROOTPARENTREFERENCE" = t7."HashKey" UNION ALL
    SELECT t0.":Hit", 1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, t1."PolicyId", NULL FROM "#DataSet" t0 INNER JOIN "ControlPointMetadata"."DocumentPolicyHold" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0.":DocKey:0" = t1."DocKey" UNION ALL
    SELECT t0.":Hit", 2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, t1."PolicyId" FROM "#DataSet" t0 INNER JOIN "ControlPointMetadata"."DocumentPolicy" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0.":DocKey:0" = t1."DocKey"
    ORDER BY ":Hit", ":FieldSet"

     

    You can copy each of these to SQL Management Studio then run them directly one after another to see the search issued directly against the ControlPointMetaStore DB.

    Take care to run it against ControlPointMetaStore DB.

    In my case I got back the row for the one file that was 0kB in size.

  • Thanks Colin. I have already done the exact same steps, but the results are coming zero byte files and non-zero byte files.

    1) General > Fields > Rule Builder and add Filesize [Map to column AU_Document_FileSize_Bytes]

    2) Restarted the engine service and can able to select the drop down from the new category now as File Size.

    3) Category > New and under  Training > Field Text : File Size is Equal to 0

    Note: I am using CP 5.8 version. It looks weird behavior. Anything missing here?

    My understanding is that this Category is something dynamic query CP will create  and execute against Database and return the results based on what we are adding. Is that any other option to enable trace or check behind the scene what is happening as I could't get much information from the Indexer logs.

     

  • This worked for me in CP 5.8:

    Ingest an empty text file (0KB in size) into a repository called Fileshare1.

    Wait for ingestion to fully complete.

    Access Administration > Settings > General > Fields to add Filesize to the list of fields (if it is not already there).

    Restart CP engine.

    Create a training category referencing repository Fileshare1 with
    Filesize is equal to 0.

    This returned the file that was 0KB in size.

  • Team,

     

    Have someone  tried the category with field text  with Filesize is 0. Even the filter have an option to showing <32 KB and I am not able to create category with 0 bytes documents. I can run the query in SQL Server to check file size is zero or File extension is IS NULL, but via category? any input pls.

Related Discussions
Recommended