Absent Member.
Absent Member.
7468 views

URL Exclusion & Inclusions in HP Web Inspect

Hi Team,

I need to include only a set of url's to be scanned by HP (Web Inspect 16.10) from a big list of url's in an application. We have option to write RegEx for excluding URL's or patterns and not for including. As in my case, the list of URL's to be excluded are high in number and include URL's are less, I decided to write 'negation' rule set for URL's to be included. So that negation rule set url's will be included and rest of the url's will be excluded. I tried writing negation rule set but couldn't able to succeed. Could you please help me in writing the same. Below is my requirement.

Main URL: http://samplescan.com/crawl/spider/

Under spider folder, I need below pages to be included in the scan. All the pages, apart from below have to be excluded from the scan.

Pages to be included: testadmin.html, testaccount.html, testpage.html

Pages to be excludedadmin.html, adminaccount.html, usersmith.html.............1oothpage.html

Thanks in advance!!

Regards,

SunnyK

 

Labels (1)
0 Likes
3 Replies
Admiral Admiral
Admiral

This regex should match what you want to include:

spider\/test[a-z]+\.html

So if you wish to exclude everything EXCEPT those that are matched using the above regex, I believe this should work:

spider\/(?!test[a-z]+\.html)
0 Likes
Absent Member.
Absent Member.

Hi Pprofili,

The below regex for exclusion is not matching with the url's, though the regex for inclusion is matching.

spider\/(?!test[a-z]+\.html)

I tried with similar regex ("?!", "^") earlier, but don't have any luck 😞

0 Likes
Micro Focus Expert
Micro Focus Expert

This regex trick is sometimes referred to as an Inclusive Exclusion (or maybe Exclusive Inclusion).   ;-}   I worked up the following alternative regex for you using the included Regular Expression Editor tool:  

       spider\/(?!test[a-z]+\.html)\w+

 

When I used this regex in the tool, it matched on only those sample lines that were not the base URL "http://samplescan.com/crawl/spider/" as well as any variations that had "...spider/t..." in them.  If the folder name "spider" is not unique enough for this single structure in the site, you might expand the regex to include more of the URI such as "\/crawl\/spider\/(?!test[a-z]+\.html)\w+"

What you would do next is add this as a Session Exclusion to WebInspect's Scan Settings, as follows

    Exclusion Name = Include Test pages only

    Target = "URL"

    Match Type = "matches regex"

    Match String = spider\/(?!test[a-z]+\.html)\w+

Full line will read as "URL" + "matches regex" + "spider\/(?!test[a-z]+\.html)\w+"

Be sure to press the green plus sign (+) in order to add this line to the Session Exclusion you are creating, then press OK to save it and return to the scan settings screen.

 

 

 

Within the Regular Express Editor, for the Search Text block I used these provided samples, but you can add more actual samples for your own verification.

+++++++++++++++++++++++++++

http://samplescan.com/crawl/spider/

http://samplescan.com/crawl/spider/testadmin.html

http://samplescan.com/crawl/spider/testaccount.html

http://samplescan.com/crawl/spider/testpage.html

http://samplescan.com/crawl/spider/admin.html

http://samplescan.com/crawl/spider/adminaccount.html

http://samplescan.com/crawl/spider/usersmith.html

http://samplescan.com/crawl/spider/1oothpage.html

+++++++++++++++++++++++++++

 

 

 


-- Habeas Data
Micro Focus Fortify Customers-Only Forums – https://community.softwaregrp.com/t5/Fortify/ct-p/fortify
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.