Let me reach out to our support team!
When you delete connector_TASK<reponame>_datastore.db for a given repository the next scan will always be a full scan. This can lead to orphaned docs getting left behind in ControlPoint if the source has changed from what was previously ingested into ControlPoint. If this happens ControlPoint has got a built-in scheduled task available to run called Delete Orphaned Documents which is accessible under Administration - Scheduled tasks - System. This scheduled task just identifies items that are in Metastore but no longer in the source repository (by checking the scanID) and deletes those from Metastore/IDOL.
Detailed below is an example of how a doc can become orphaned:
Scan a repository to ingest 1 file.
Check the file is visible in CP UI.
Then delete the file in the source.
Add a new file to source.
Delete connector_TASK<reponame>_datastore.db for this repository
Then do a scan - which will default to doing a full scan as there is no connector_TASK<reponame>_datastore.db
Afterwards the new file will be picked up and ingested into CP.
The original file will NOT be removed and becomes an orphaned doc.
If you try to open this file you will not be able to because it does not exist on the source.
If there is a tag on an item, could it still be removed during the nightly Deleted Orphans process? We see items that have been erased from the source and are remaining in the repository even after the incremental scan. The Delete Orphans identifies them in the log, but they still remain in the metastore. Sort of a labeled stuck orphan.