Valued Contributor.. Valued Contributor..
Valued Contributor..
119 views

Push URL into WebConnector

Jump to solution

We would like to push a URL into WebConnector when it's created/updated within our CMS.  Is there a way to push a URL into WebConnector compared to having it poll for what to index?  I'm trying to avoid pushing directly into CFS.   Reproducing the parsing and all the logic within WC is not something I want to take on.

0 Likes
1 Solution

Accepted Solutions
Highlighted
Valued Contributor.. Valued Contributor..
Valued Contributor..

Re: Push URL into WebConnector

Jump to solution

Vinay,

Thank you for your response.  The first answer got me where I needed.  The final solution that we will be implementing is as follows.

To add a URL to the IDOL database issue the following command:

 

 

http://webConnectorServer:7006/action=Fetch&FetchAction=Synchronize&TaskSection=immediate&Url0=https://our.website.com/path/to/resource&Url1=https://our.website.com/path/to/resourceNum2%26t=2

 

Note that the URLs supplied need to be URLEncoded.

The fetch task is defined as follows:

 

[immediate]
url0=https://our.website.com/
IngestEnableDeletes=true
StayOnSite=true
IndexDatabase=db1
Depth=0
SpiderUrlMustHaveRegex=(.*/our\.website\.com/.*)
SpiderUrlCantHaveRegex=(.*\.css$)|(.*.css\?.*$)|(.*\.js$)|(.*\.js,.*$)|(.*.js\?.*)|(.*.zip.*)|(.*\?utm.*)|(.*&utm.*)
UrlCantHaveRegex=(.*\.css$)|(.*.css\?.*$)|(.*\.js$)|(.*\.js,.*$)|(.*.js\?.*)|(.*.zip.*)|(.*\?utm.*)|(.*&utm.*)
UserAgent=IDOL12-immediate

 

You only need to use base64 encoding if you are supplying a whole fetch task.

Note to find the port number look in the webconnector.cfg file in the [server] section for the port number.  You may need to add a line for Access-Control-Allow-Origin=<cms-server-name>  (or * if you don't mind who can affect your webconnector (not recommended for production!)

 

View solution in original post

0 Likes
4 Replies
Highlighted
Micro Focus Frequent Contributor
Micro Focus Frequent Contributor

Re: Push URL into WebConnector

Jump to solution
0 Likes
Highlighted
Valued Contributor.. Valued Contributor..
Valued Contributor..

Re: Push URL into WebConnector

Jump to solution

Thanks - looking at that option it appears that will kick off a sync for a fetchtask.  There is an option to specify an identifier but unless that identifier can be be a URL I'm looking to insert new content without recrawling the whole site or even a section.

Can the call look something like

http://host:port/action=Fetch&FetchAction=Synchronize&Identifiers=<new_url>

Or do I want to create a dummyFeeder fetchtask and use 

http://host:port/action=Fetch&FetchAction=Synchronize&TaskSection=Feeder&[Feeder]Url0=<new url>

 

0 Likes
Highlighted
Micro Focus Frequent Contributor
Micro Focus Frequent Contributor

Re: Push URL into WebConnector

Jump to solution

You will need to create the Config parameter to set the new URL and other parameters such as MaxPages=1 so that only that page is crawled. 

The config parameter expects the WebConnector.cfg task config section as a base 64 encoded string 

The configuration below will only index one URL. So, if the following is the FetchTask

[BPay]
//The url(s) from which to start the crawl.
IndexDatabase=Finacial_Industry
//Regexes to restrict pages that are crawled for links. If a page is not crawled, it will not be indexed.
SpiderUrlMustHaveRegex=
SpiderUrlCantHaveRegex=.*\.css$|.*.css\?.*$|.*\.js$|.*\.js,.*$|.*.js\?.*|.*\/ScriptResource\.axd.*|.*\/WebResource\.axd.*

//Regexes to restrict pages that are indexed.
UrlMustHaveRegex=.*\BAY
UrlCantHaveRegex=
PageMustHaveRegex=
PageCantHaveRegex=
ContentTypeMustHaveRegex=
ContentTypeCantHaveRegex=(application|text)/(javascript|xml|x-javascript|css)(;.*)?

//The delay between processing pages, per sync thread
PageDelay=5s

//If StayOnSite=true, all links that are followed on a page must be on the same in order site to be crawled
StayOnSite=true

//Maximum ammount of time to spend crawling, 0 indicating unlimited
SiteDuration=0s

//Maximum number of pages to ingest per synchronize run, 0 indicating unlimited
MaxPages=1
 
Then the following is the base 64 encoding 
 
W0JQYXldCi8vVGhlIHVybChzKSBmcm9tIHdoaWNoIHRvIHN0YXJ0IHRoZSBjcmF3bC4KVXJsPWh0dHBzOi8vZW4ud2lraXBlZGlhLm9yZy93aWtpL0JQQVkKSW5kZXhEYXRhYmFzZT1GaW5hY2lhbF9JbmR1c3RyeQovL1JlZ2V4ZXMgdG8gcmVzdHJpY3QgcGFnZXMgdGhhdCBhcmUgY3Jhd2xlZCBmb3IgbGlua3MuIElmIGEgcGFnZSBpcyBub3QgY3Jhd2xlZCwgaXQgd2lsbCBub3QgYmUgaW5kZXhlZC4KU3BpZGVyVXJsTXVzdEhhdmVSZWdleD0KU3BpZGVyVXJsQ2FudEhhdmVSZWdleD0uKlwuY3NzJHwuKi5jc3NcPy4qJHwuKlwuanMkfC4qXC5qcywuKiR8LiouanNcPy4qfC4qXC9TY3JpcHRSZXNvdXJjZVwuYXhkLip8LipcL1dlYlJlc291cmNlXC5heGQuKgoKLy9SZWdleGVzIHRvIHJlc3RyaWN0IHBhZ2VzIHRoYXQgYXJlIGluZGV4ZWQuClVybE11c3RIYXZlUmVnZXg9LipcQkFZClVybENhbnRIYXZlUmVnZXg9ClBhZ2VNdXN0SGF2ZVJlZ2V4PQpQYWdlQ2FudEhhdmVSZWdleD0KQ29udGVudFR5cGVNdXN0SGF2ZVJlZ2V4PQpDb250ZW50VHlwZUNhbnRIYXZlUmVnZXg9KGFwcGxpY2F0aW9ufHRleHQpLyhqYXZhc2NyaXB0fHhtbHx4LWphdmFzY3JpcHR8Y3NzKSg7LiopPwoKLy9UaGUgZGVsYXkgYmV0d2VlbiBwcm9jZXNzaW5nIHBhZ2VzLCBwZXIgc3luYyB0aHJlYWQKUGFnZURlbGF5PTVzCgovL0lmIFN0YXlPblNpdGU9dHJ1ZSwgYWxsIGxpbmtzIHRoYXQgYXJlIGZvbGxvd2VkIG9uIGEgcGFnZSBtdXN0IGJlIG9uIHRoZSBzYW1lIGluIG9yZGVyIHNpdGUgdG8gYmUgY3Jhd2xlZApTdGF5T25TaXRlPXRydWUKCi8vTWF4aW11bSBhbW1vdW50IG9mIHRpbWUgdG8gc3BlbmQgY3Jhd2xpbmcsIDAgaW5kaWNhdGluZyB1bmxpbWl0ZWQKU2l0ZUR1cmF0aW9uPTBzCgovL01heGltdW0gbnVtYmVyIG9mIHBhZ2VzIHRvIGluZ2VzdCBwZXIgc3luY2hyb25pemUgcnVuLCAwIGluZGljYXRpbmcgdW5saW1pdGVkCk1heFBhZ2VzPTEK
 
So you can specify all the config parameters properly for your job using the above approach. 
0 Likes
Highlighted
Valued Contributor.. Valued Contributor..
Valued Contributor..

Re: Push URL into WebConnector

Jump to solution

Vinay,

Thank you for your response.  The first answer got me where I needed.  The final solution that we will be implementing is as follows.

To add a URL to the IDOL database issue the following command:

 

 

http://webConnectorServer:7006/action=Fetch&FetchAction=Synchronize&TaskSection=immediate&Url0=https://our.website.com/path/to/resource&Url1=https://our.website.com/path/to/resourceNum2%26t=2

 

Note that the URLs supplied need to be URLEncoded.

The fetch task is defined as follows:

 

[immediate]
url0=https://our.website.com/
IngestEnableDeletes=true
StayOnSite=true
IndexDatabase=db1
Depth=0
SpiderUrlMustHaveRegex=(.*/our\.website\.com/.*)
SpiderUrlCantHaveRegex=(.*\.css$)|(.*.css\?.*$)|(.*\.js$)|(.*\.js,.*$)|(.*.js\?.*)|(.*.zip.*)|(.*\?utm.*)|(.*&utm.*)
UrlCantHaveRegex=(.*\.css$)|(.*.css\?.*$)|(.*\.js$)|(.*\.js,.*$)|(.*.js\?.*)|(.*.zip.*)|(.*\?utm.*)|(.*&utm.*)
UserAgent=IDOL12-immediate

 

You only need to use base64 encoding if you are supplying a whole fetch task.

Note to find the port number look in the webconnector.cfg file in the [server] section for the port number.  You may need to add a line for Access-Control-Allow-Origin=<cms-server-name>  (or * if you don't mind who can affect your webconnector (not recommended for production!)

 

View solution in original post

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.