CP to Export list of items with metadata in a CSV

3 Likes

ControlPoint allows the data analysts to export a list of items in a CSV file with the desired associated metadata.

Go to the report, as known as category, in the list of items. Select those you want to export in the list.

image.png

Go to ACTIONS and select EXPORT.

image.png

You get a CSV file with all items selected.

image.png

To customize the columns, go to ADMINISTRATION >>> SETTINGS >>> GENERAL >>> FIELDS >>> ITEMS PROPERTIES.

image.png

You can add or remove fields that are exported in the CSV file.

 

Labels:

How To-Best Practice
Comment List
Anonymous
  • Part of the reason we located files with a size greater than Zero was that several files had attachments indexed also had zero byte files, therefore they were included in the results.

  • The max that a repository will export in the UI is 500K, is there a SQL script any has prepared that will perform the same metadata export for the repository if it exceeds 500K?

  • Was there any resolution to this request?  I also have tried multiple ways to locate Zero byte files, and it always includes files larger as well in the Category.

  • Thanks Keith, Will wait for Colin to pick up my request that I raised. BTW, I already tried MATCH ON SUBITEMS = YES , and the result is completely weird and more than what I expect as Non-zero bytes files coming.

     

  • Tarik is correct and any further investigation should be carried out in the support ticket.

    In answer to question though for any followers on here see below:

     

    If you want to allow Category to match the results that are Subitems then within the Category settings, click on the Options tab and set:

    MATCH ON SUBITEMS = YES

     

    When you go back into the Results tab, you should now see the 0 byte sub item file.

  • Thank you.

    He will definitely do so once he comes online.

     

     

  • Thanks Tbaki, The ticket has been raised. It will be good if Colin can pickup this request as it will easy continue to discuss with me and address. [ SD02699630 ]

  • This has now gone into full blown analysis and requires a ticket.

    Colin will only be able to continue if a ticket is created.

     

    Please create one and let us know the number.

    Thanks

    Tarik

  • Hi Colin,

    This helps to identify what is happening behind the scene. CP building dynamic query by fetching the result and insert into the temp table [ #Dataset] and process the results by join query with [ControlPointMetadata"."Additional", MetaData.Document, Metadata"."LocationHash" based on different conditions during the execution.

     

    1) I tried SQL Profiler and captured the same query as below. [ Only the repository ID is different in my case as you can see it is "24"

    SELECT TOP 25 ROW_NUMBER() OVER(ORDER BY ":SortExpr:0", ":SortExpr:1") AS ":Hit", ":RepositoryId", ":DocKey:0", "DREREFERENCE", "AU_DOCUMENT_FILESIZE_BYTES", "CPROOTPARENTREFERENCE" INTO "#DataSet" FROM (SELECT ROW_NUMBER() OVER(PARTITION BY t1."MatchArchive" ORDER BY t1."MatchArchive") AS ":CombineNumber", t0."RepositoryId" AS ":RepositoryId", t0."DocKey" AS ":DocKey:0", t0."RepositoryId" AS ":SortExpr:0", t0."DocKey" AS ":SortExpr:1", t0."DocKey" AS "DREREFERENCE", t0."FileSize" AS "AU_DOCUMENT_FILESIZE_BYTES", t1."MatchArchive" AS "CPROOTPARENTREFERENCE" FROM "Metadata"."Document" t0 LEFT OUTER JOIN "ControlPointMetadata"."Additional" t1 ON t0."RepositoryId" = t1."RepositoryId" AND t0."DocKey" = t1."DocKey" WHERE t0."RepositoryId" = 14 AND t0."FileSize" = 0) tbl WHERE ":CombineNumber" = 1
    ORDER BY ":Hit";


    SELECT t0.":Hit", 0 AS ":FieldSet", t0.":RepositoryId", t1."Value" AS "DREREFERENCE", t2."DocumentName" AS "DRETITLE", t2."FileExtension" AS "IMPORTMAGICEXTENSION", t2."DocumentStatus" AS "CP_DOCUMENT_STATUS", t3."Value" AS "DREROOTPARENTREFERENCE", t2."OriginalFileExtension" AS "ORIGINALEXTENSION", t0."AU_DOCUMENT_FILESIZE_BYTES", t2."DisplayName" AS "AU_REPOSITORY_TITLE_STRING", t2."AutnIdentifierValue" AS "AUTN_IDENTIFIER", t4."Value" AS "CPLOCATION", t5."Value" AS "CPROOTFAMILYREFERENCE", t7."Value" AS "CPROOTPARENTREFERENCE", t2."DateLastModified" AS "AU_REPOSITORY_MODIFIEDDATE_EPOCHSECONDS", NULL AS "CP_HOLD" FROM "#DataSet" t0 INNER JOIN "Metadata"."ReferenceHash" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0."DREREFERENCE" = t1."HashKey" INNER JOIN "Metadata"."Document" t2 ON t0.":RepositoryId" = t2."RepositoryId" AND t0.":DocKey:0" = t2."DocKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t3 ON t0.":RepositoryId" = t3."RepositoryId" AND t2."RootParentReferenceHash" = t3."HashKey" LEFT OUTER JOIN "Metadata"."LocationHash" t4 ON t0.":RepositoryId" = t4."RepositoryId" AND t2."LocationHash" = t4."HashKey" LEFT OUTER JOIN "ControlPointMetadata"."Additional" t6 ON t0.":RepositoryId" = t6."RepositoryId" AND t0.":DocKey:0" = t6."DocKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t5 ON t0.":RepositoryId" = t5."RepositoryId" AND t6."MatchWithinArchive" = t5."HashKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t7 ON t0.":RepositoryId" = t7."RepositoryId" AND t0."CPROOTPARENTREFERENCE" = t7."HashKey" UNION ALL
    SELECT t0.":Hit", 1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, t1."PolicyId" FROM "#DataSet" t0 INNER JOIN "ControlPointMetadata"."DocumentPolicyHold" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0.":DocKey:0" = t1."DocKey"
    ORDER BY ":Hit", ":FieldSet"

     

    Step2) I ran the above query and can get the exact 2 rows with zero bytes. 1 row is document with zero bytes and other row is one PST File. When I compare the query output, realized inside the PST File , there is one zero byte file which is correct and  showing in the query results. But , during category result, it is displaying the parentdocument [DREROOTPARENTREFERENCE ] instead of the child [ DREREFERENCE ] . In my case instead of showing the zero byte file inside the PST, it is showing the pst file itself in the category result which is wrong. Because this pst size is 100 MB.

     

     

    :RepositoryIdDREREFERENCEDRETITLEAU_DOCUMENT_FILESIZE_BYTESDREROOTPARENTREFERENCE24\\XXXXXX\FileShare\ScanTest\addresspoints_line.MIDaddresspoints_line.MID0NULL24\\XXXXXX\FileShare\ScanTest\Sample File-CP\XXX-Way Tim.pst:/Top of Personal Folders/FW Count_XXXX.mail:base_PM_version_1_7 2001 for updating zone system XXX Check.zip:base_PM_version_1_7 2001 for updating zone system XXX Check/leftleft0\\XXXX\FileShare\ScanTest\Sample File-CP\XXX-Way Tim.pst

    3) Have someone tried this scenario where 0 byte files inside Zip or inside pst and Analysis including "Analyse Sub Items - Yes" and when you run the category the results are showing not correct, even though the query output will show as zero bytes only.

    4) Anything to do with Fine-Tuning or options in the Category or any way to display the result of the "DREREFERENCE" in the category result [instead of DREROOTPARENTREFERENCE]which I believe is the solution to the problem. Unless this category will work only on the documents without sub items?. Pls advice.

     

  • Check in this table that the Filesize column is 0 for your file:

    select * from [ControlPointMetaStore].[Metadata].[Document] WITH (nolock)
    where DocumentName = 'zero k file.txt'

    replace 'zero k file.txt' with the name of file you are wishing to check in metastore


    Also you can use SQL profiler shipped with SQL Server to see the underlying query issued to SQL.

    Within SQL profiler Select the following event to capture:
    TSQL: SQL:BatchCompleted

    Click Show all columns

    Then select Column Filter and Add DatabaseName Like ControlPointMetaStore

    This limits the traffic to just those calls to ControlPointMetaStore database.

    Click Run and then capture yourself clicking the category to display the docs in the listview.

    Then stop the trace.

    You will see a line in the resulting SQL trace that contains 2 lines like below:

    SELECT TOP 25 ROW_NUMBER() OVER(ORDER BY ":SortExpr:0", ":SortExpr:1") AS ":Hit", ":RepositoryId", ":DocKey:0", "DREREFERENCE", "AU_DOCUMENT_FILESIZE_BYTES", "CPROOTPARENTREFERENCE" INTO "#DataSet" FROM (SELECT ROW_NUMBER() OVER(PARTITION BY t1."MatchArchive" ORDER BY t1."MatchArchive") AS ":CombineNumber", t0."RepositoryId" AS ":RepositoryId", t0."DocKey" AS ":DocKey:0", t0."RepositoryId" AS ":SortExpr:0", t0."DocKey" AS ":SortExpr:1", t0."DocKey" AS "DREREFERENCE", t0."FileSize" AS "AU_DOCUMENT_FILESIZE_BYTES", t1."MatchArchive" AS "CPROOTPARENTREFERENCE" FROM "Metadata"."Document" t0 LEFT OUTER JOIN "ControlPointMetadata"."Additional" t1 ON t0."RepositoryId" = t1."RepositoryId" AND t0."DocKey" = t1."DocKey" WHERE t0."RepositoryId" = 1 AND t0."FileSize" = 0) tbl WHERE ":CombineNumber" = 1
    ORDER BY ":Hit";

    SELECT t0.":Hit", 0 AS ":FieldSet", t0.":RepositoryId", t1."Value" AS "DREREFERENCE", t2."DocumentName" AS "DRETITLE", t2."FileExtension" AS "IMPORTMAGICEXTENSION", t2."DocumentStatus" AS "CP_DOCUMENT_STATUS", t3."Value" AS "DREROOTPARENTREFERENCE", t2."OriginalFileExtension" AS "ORIGINALEXTENSION", t0."AU_DOCUMENT_FILESIZE_BYTES", t2."DisplayName" AS "AU_REPOSITORY_TITLE_STRING", t2."AutnIdentifierValue" AS "AUTN_IDENTIFIER", t4."Value" AS "CPLOCATION", t5."Value" AS "CPROOTFAMILYREFERENCE", t7."Value" AS "CPROOTPARENTREFERENCE", t2."DateLastModified" AS "AU_REPOSITORY_MODIFIEDDATE_EPOCHSECONDS", NULL AS "CP_HOLD", NULL AS "CP_POLICYID" FROM "#DataSet" t0 INNER JOIN "Metadata"."ReferenceHash" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0."DREREFERENCE" = t1."HashKey" INNER JOIN "Metadata"."Document" t2 ON t0.":RepositoryId" = t2."RepositoryId" AND t0.":DocKey:0" = t2."DocKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t3 ON t0.":RepositoryId" = t3."RepositoryId" AND t2."RootParentReferenceHash" = t3."HashKey" LEFT OUTER JOIN "Metadata"."LocationHash" t4 ON t0.":RepositoryId" = t4."RepositoryId" AND t2."LocationHash" = t4."HashKey" LEFT OUTER JOIN "ControlPointMetadata"."Additional" t6 ON t0.":RepositoryId" = t6."RepositoryId" AND t0.":DocKey:0" = t6."DocKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t5 ON t0.":RepositoryId" = t5."RepositoryId" AND t6."MatchWithinArchive" = t5."HashKey" LEFT OUTER JOIN "Metadata"."ReferenceHash" t7 ON t0.":RepositoryId" = t7."RepositoryId" AND t0."CPROOTPARENTREFERENCE" = t7."HashKey" UNION ALL
    SELECT t0.":Hit", 1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, t1."PolicyId", NULL FROM "#DataSet" t0 INNER JOIN "ControlPointMetadata"."DocumentPolicyHold" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0.":DocKey:0" = t1."DocKey" UNION ALL
    SELECT t0.":Hit", 2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, t1."PolicyId" FROM "#DataSet" t0 INNER JOIN "ControlPointMetadata"."DocumentPolicy" t1 ON t0.":RepositoryId" = t1."RepositoryId" AND t0.":DocKey:0" = t1."DocKey"
    ORDER BY ":Hit", ":FieldSet"

     

    You can copy each of these to SQL Management Studio then run them directly one after another to see the search issued directly against the ControlPointMetaStore DB.

    Take care to run it against ControlPointMetaStore DB.

    In my case I got back the row for the one file that was 0kB in size.

Related Discussions
Recommended