requestUrlHost missing if hostname contains an underscore
Not sure if anyone noticed this but in case where the smartconnector needs to combine the URL host, filename and query from the logs (usually proxy) to create requestUrl, if site DNS hostname contains an underscore, the requestUrlHost will be blank and the requestUrl will not be accurately represented. In those cases the requestUrlHost is blank and the requestUrl looks like http:///filename.html, skipping the hostname portion. The simplest way to check is to do a search through your proxy logs, if you are collecting them, where destinationHostName contains "_" and take a look at requestUrl field.
This is happening because underscores aren't supposed to appear in the requestUrl according to the RFC and the smartconnector framework is using a java library that is enforcing strict RFC compliance. However, in reality, lots of sites have underscores in the hostname, nytimes.com, amazonaws.com, akamaihd.net are just some of the second level domains where I've seen this problem.
It's my position that enforcing RFC compliance is the job of the products that are providing the access, not ArcSight. If the products fail to do so, and creates logs recording the events as they took place, I believe it is ArcSight’s responsibility to accurately capture the event as it was logged. I just had a call with Product Management about this and I think they see it my way, but obviously this requires some development effort to work around this problem. I am trying to gauge how many others on this forum are affected by this.