Highlighted
Trusted Contributor.. Trusted Contributor..
Trusted Contributor..
120 views

404 page not removed from database

Jump to solution

I have a page that is in our idol database that was removed from the system.  When the webconnector crawls that page it sees a 404 but doesn't remove it. This is the log from the webconnector. 

What am I missing?

Below is the config

08/04/2020 09:19:34 [416] 70-Error: NTRSIMMEDIATE: Root url not found: https://www.xxx.com/apage-that-is-in-the-database/but-has-now-returns-404
08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:End load url: https://www.xxx.com/apage-that-is-in-the-database/but-has-now-returns-404
08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:Storing shared cookie jar cookie: __cfduid ; Domain=.app-ab23.marketo.com ; Path=/
08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:Storing shared cookie jar cookie: bm_sv ; Domain=.northerntrust.com ; Path=/
08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:Storing shared cookie jar cookie: __cfduid ; Domain=.onetrust.com ; Path=/
08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:HTTP Status: 404
08/04/2020 09:19:34 [273] 10-Full: NTRSIMMEDIATE: WKOOP 2f0294f0c6ec187458d05265a17690f5 task complete
08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:Waiting for task
08/04/2020 09:19:34 [414] 70-Error: NTRSIMMEDIATE: Root url not found: https://www.xxx.com/apage-that-is-in-the-database/but-has-now-returns-404
08/04/2020 09:19:34 [18] 10-Full: NTRSIMMEDIATE: Finished processing depth: 0, 1 pages
08/04/2020 09:19:34 [18] 10-Full: NTRSIMMEDIATE: 0 unseen urls, 0 documents removed

 

config

[immediate]
Url0=https://www.xxx.com/apage-that-is-in-the-database/but-has-now-returns-404
IngestEnableDeletes=true
StayOnSite=true
IndexDatabase=xxx
Depth=0
SpiderUrlMustHaveRegex=.*
UserAgent=IDOL12-immediate

 

0 Likes
1 Solution

Accepted Solutions
Highlighted
Micro Focus Frequent Contributor
Micro Focus Frequent Contributor

Re: 404 page not removed from database

Jump to solution

The error is occurring here because this isn't just a page that exists in the database, but is the configured root document for the task. The root URLs are the task's entry point into the site, and always need to be reachable; otherwise, there's no way to differentiate this from a misconfigured task, or some sort of failure with the site itself. A task without valid root URLs isn't viable, so the connector isn't even getting to the point where it will fully start up the task, let alone attempt to issue deletes.

Was this the original connector and task that indexed the document? Even if the task was able to start, deletes are issued only for documents that it has previously encountered, based on the contents of its datastore DB. If that's missing, the connector won't be aware that it's responsible for the IDOL document in question, and you may need to delete it manually from IDOL (or the Content Engine).

View solution in original post

0 Likes
1 Reply
Highlighted
Micro Focus Frequent Contributor
Micro Focus Frequent Contributor

Re: 404 page not removed from database

Jump to solution

The error is occurring here because this isn't just a page that exists in the database, but is the configured root document for the task. The root URLs are the task's entry point into the site, and always need to be reachable; otherwise, there's no way to differentiate this from a misconfigured task, or some sort of failure with the site itself. A task without valid root URLs isn't viable, so the connector isn't even getting to the point where it will fully start up the task, let alone attempt to issue deletes.

Was this the original connector and task that indexed the document? Even if the task was able to start, deletes are issued only for documents that it has previously encountered, based on the contents of its datastore DB. If that's missing, the connector won't be aware that it's responsible for the IDOL document in question, and you may need to delete it manually from IDOL (or the Content Engine).

View solution in original post

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.