Absent Member.
Absent Member.

Detecting Web Crawlers using Pattern Discovery


I'm trying to utilize Pattern Discovery to detect patterns in HTTP-logs. All I have to work with is full URL (http://example.com/folder/different_stuff), source and destination ip.I need to detect those ip's, who are "sweeping" or "crawling" along the site, either trying to download the whole site, or to extract some information from a subsection of the site, or from search page.

Firewall (Stonegate) puts URL to event.message field. I'm trying to create a profile:


and run it, but I recieve this:


which probably means that no actual patterns have been discovered. But they are surely are in those logs. Bots are constantly crawling the site and consuming all juices from search engine and application servers.

Maybe someone can explain me how to use PD for my particular task?


Labels (1)
1 Reply
Absent Member.
Absent Member.

Sorry to resurrect this old question but since I came across it and thought I'd put a hint here for anyone that has the same problem but hasn't asked (121 views...)

This is easy to fix: The error is actually very descriptive and is talking about a default limit of 20000 events, which is set in your server.properties file for ESM (you'll need to add the line if it is at the default)

patterns.maxUniqueEvents=[your number here]

The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.