Detecting Web Crawlers using Pattern Discovery
I'm trying to utilize Pattern Discovery to detect patterns in HTTP-logs. All I have to work with is full URL (http://example.com/folder/different_stuff), source and destination ip.I need to detect those ip's, who are "sweeping" or "crawling" along the site, either trying to download the whole site, or to extract some information from a subsection of the site, or from search page.
Firewall (Stonegate) puts URL to event.message field. I'm trying to create a profile:
and run it, but I recieve this:
which probably means that no actual patterns have been discovered. But they are surely are in those logs. Bots are constantly crawling the site and consuming all juices from search engine and application servers.
Maybe someone can explain me how to use PD for my particular task?
Sorry to resurrect this old question but since I came across it and thought I'd put a hint here for anyone that has the same problem but hasn't asked (121 views...)
This is easy to fix: The error is actually very descriptive and is talking about a default limit of 20000 events, which is set in your server.properties file for ESM (you'll need to add the line if it is at the default)
patterns.maxUniqueEvents=[your number here]