Highlighted
Absent Member.. Absent Member..
Absent Member..
278 views

SM7.10 web crawling

Hi,
Please, did anybody of you successfully implemented web crawling (i.e. indexing external web sites and using the SM search engine to search for this external knowledge)? I'm trying, but so far I'm not successfull with the configuration...
Thanks for any help

Michaela
0 Likes
7 Replies
Highlighted
Absent Member.. Absent Member..
Absent Member..


Nobody used the web crawling feature? Please help if you have any smallest experience with it, as I really need to get this working and the success is still far away from me.....

Thanks Michaela
0 Likes
Highlighted
Absent Member.. Absent Member..
Absent Member..

... In fact, I have absolutely the same problems with share disk crawling... getting the same error in the sm.log, "com.verity.api.administration.ConfigurationException: java.io.IOException: Read timed out". No clue...
Could you give me an example of the field definitions for a fslib/weblib? As there are no field names, I don't know what to write here.

Thanks a lot
Michaela
0 Likes
Highlighted
Absent Member.
Absent Member.

Hi,

I get same error in 7.01 KM, when trying to
web crawl. Has anyone out there ever achieved this? And as Michaela asks anyone care to show a workable field definitions and type information for web crawl?

cheers


peter
0 Likes
Highlighted
Absent Member.
Absent Member.

Hi Michaela,

I can index my c drive on server. (see attachment, obvioulsy more work needed with definitions)

I still cannot crawl intranet or web, we have asked HP to provide us with the definitions of an external site that will work. After all they wrote sm7 they should have tested it? didn't they??

cheers



0 Likes
Highlighted
Established Member..
Established Member..

Hi,
I have the same problem as you. I'm trying to download the attachtment but I cann't. Could you attatch it again?

Thank you very much.
BR,
Joana.
0 Likes
Highlighted
Absent Member.
Absent Member.

Hello all


1)re-attaching png files for a fslib index. (zip file)

2) HP advise that crawling intranet has problems. they dont say what sort.

3) the com.verity error is actually a timeout. you need to dig into the KM directories to find the logs. FYI, their is also a Jobs directory where km builds a job for each of the indexes. But watch out it yours your PASSWORD in plain text, so if you are going to go to an external site you should create a new account to get past the proxy.

D:\Program Files\HP\Service Manager 7.00\Search Engine <
4) I have seen running low on memory errors in my logs, so i am at this moment re-indexing an a 16 gig box.

5) tomorrow i will try to crawl the web again, if successful i will post the definitions.

6) Michaela please come back and let us know how you are getting on.

have a fun day!

peter
0 Likes
Highlighted
Absent Member.. Absent Member..
Absent Member..

Hi all,

Sorry, I didn't watch this thread anymore, I've read the inputs as late as today...

At the end, we succeeded with the web crawling setup. The exception that I mention in my first input was in our internal environment and was related to the specific computer setup in our local environment. When I did exactly the same on VMware (and later on customer's place), this exception was gone.

During the project implementation, we really discovered (and communicated to support) that in intranet, there may be problems - connected to securized websites. We needed to crawl https intranet websites and Integrated Windows Authentication (IIS) websites, but the searchengine is currently unable to search these sites.

Hope that helps,
Michaela
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.