
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
URL Exclusion & Inclusions in HP Web Inspect
Hi Team,
I need to include only a set of url's to be scanned by HP (Web Inspect 16.10) from a big list of url's in an application. We have option to write RegEx for excluding URL's or patterns and not for including. As in my case, the list of URL's to be excluded are high in number and include URL's are less, I decided to write 'negation' rule set for URL's to be included. So that negation rule set url's will be included and rest of the url's will be excluded. I tried writing negation rule set but couldn't able to succeed. Could you please help me in writing the same. Below is my requirement.
Main URL: http://samplescan.com/crawl/spider/
Under spider folder, I need below pages to be included in the scan. All the pages, apart from below have to be excluded from the scan.
Pages to be included: testadmin.html, testaccount.html, testpage.html
Pages to be excluded: admin.html, adminaccount.html, usersmith.html.............1oothpage.html
Thanks in advance!!
Regards,
SunnyK


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
This regex should match what you want to include:
spider\/test[a-z]+\.html
So if you wish to exclude everything EXCEPT those that are matched using the above regex, I believe this should work:
spider\/(?!test[a-z]+\.html)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Pprofili,
The below regex for exclusion is not matching with the url's, though the regex for inclusion is matching.
spider\/(?!test[a-z]+\.html)
I tried with similar regex ("?!", "^") earlier, but don't have any luck 😞

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
This regex trick is sometimes referred to as an Inclusive Exclusion (or maybe Exclusive Inclusion). ;-} I worked up the following alternative regex for you using the included Regular Expression Editor tool:
spider\/(?!test[a-z]+\.html)\w+
When I used this regex in the tool, it matched on only those sample lines that were not the base URL "http://samplescan.com/crawl/spider/" as well as any variations that had "...spider/t..." in them. If the folder name "spider" is not unique enough for this single structure in the site, you might expand the regex to include more of the URI such as "\/crawl\/spider\/(?!test[a-z]+\.html)\w+"
What you would do next is add this as a Session Exclusion to WebInspect's Scan Settings, as follows
Exclusion Name = Include Test pages only
Target = "URL"
Match Type = "matches regex"
Match String = spider\/(?!test[a-z]+\.html)\w+
Full line will read as "URL" + "matches regex" + "spider\/(?!test[a-z]+\.html)\w+"
Be sure to press the green plus sign (+) in order to add this line to the Session Exclusion you are creating, then press OK to save it and return to the scan settings screen.
Within the Regular Express Editor, for the Search Text block I used these provided samples, but you can add more actual samples for your own verification.
+++++++++++++++++++++++++++
http://samplescan.com/crawl/spider/
http://samplescan.com/crawl/spider/testadmin.html
http://samplescan.com/crawl/spider/testaccount.html
http://samplescan.com/crawl/spider/testpage.html
http://samplescan.com/crawl/spider/admin.html
http://samplescan.com/crawl/spider/adminaccount.html
http://samplescan.com/crawl/spider/usersmith.html
http://samplescan.com/crawl/spider/1oothpage.html
+++++++++++++++++++++++++++
-- Habeas Data
Micro Focus Fortify Customers-Only Forums – https://community.softwaregrp.com/t5/Fortify/ct-p/fortify