Having problems with your account or logging in?
A lot of changes are happening in the community right now. Some may affect you. READ MORE HERE
sn3 Absent Member.
Absent Member.
5235 views

Pattern-based link parsing vs. DOM-based link parsing

Jump to solution

Is there a best practice or any advice I can follow when determining how to extract links (Pattern-based vs DOM-based)? I see that Pattern-based parsing is selected by default, but after reading the help menu I am under the impression that DOM-based parsing is more accurate.

Thank you

0 Likes
1 Solution

Accepted Solutions
Highlighted
Micro Focus Expert
Micro Focus Expert

Re: Pattern-based link parsing vs. DOM-based link parsing

Jump to solution

I took this on to Development and got quite an education.  The discussion revolved around what did this new Link Source scan setting panel mean and offer the user.

For the public, I can share that the DOM-based parsing is new and being improved in the coming releases/years.  The Pattern-based parsing is the older (current) script parsing behavior of WebInspect (currently at v16.10), and it is quite aggressive.  The draw-back to this aggressiveness is that while WebInspect is then capable at locating links that would have been terribly difficult to identify, thus expanding the attack surface area considerably, it can just as frequently cause a "runaway scan" with may incorrect links being added to the Crawl count.  This would be particularly true for unrooted (relative) URLs found in the comments or script areas of a Master Page, visible throughout a particular site.

The newer DOM-based parser engine allows for new metadata so that the engine can intelligently identify valid links, or at least identify those that the researcher is interested in following or wishes to permit.  The developer's summary was that the newer DOM-based parser is or will be the superior one to use.  The balance is that you want to hone down what links WebInspect chases, but need to balance that with how much time you have in the day and how deeply you wish it to chase links for you.

For WebInspect 16.10, if I were to enable the DOM-parser on this Link sources scan settings panel, I think would leave the sub-options as they are (especially "Allowed unrooted URLs" deselected), but I would also disable the box for "Include Comment Links (Aggressive)". 


-- Habeas Data
Micro Focus Fortify Customers-Only Forums – https://community.softwaregrp.com/t5/Fortify/ct-p/fortify
1 Reply
Highlighted
Micro Focus Expert
Micro Focus Expert

Re: Pattern-based link parsing vs. DOM-based link parsing

Jump to solution

I took this on to Development and got quite an education.  The discussion revolved around what did this new Link Source scan setting panel mean and offer the user.

For the public, I can share that the DOM-based parsing is new and being improved in the coming releases/years.  The Pattern-based parsing is the older (current) script parsing behavior of WebInspect (currently at v16.10), and it is quite aggressive.  The draw-back to this aggressiveness is that while WebInspect is then capable at locating links that would have been terribly difficult to identify, thus expanding the attack surface area considerably, it can just as frequently cause a "runaway scan" with may incorrect links being added to the Crawl count.  This would be particularly true for unrooted (relative) URLs found in the comments or script areas of a Master Page, visible throughout a particular site.

The newer DOM-based parser engine allows for new metadata so that the engine can intelligently identify valid links, or at least identify those that the researcher is interested in following or wishes to permit.  The developer's summary was that the newer DOM-based parser is or will be the superior one to use.  The balance is that you want to hone down what links WebInspect chases, but need to balance that with how much time you have in the day and how deeply you wish it to chase links for you.

For WebInspect 16.10, if I were to enable the DOM-parser on this Link sources scan settings panel, I think would leave the sub-options as they are (especially "Allowed unrooted URLs" deselected), but I would also disable the box for "Include Comment Links (Aggressive)". 


-- Habeas Data
Micro Focus Fortify Customers-Only Forums – https://community.softwaregrp.com/t5/Fortify/ct-p/fortify
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.