Threaddumping - am I overreacting?
OK, maybe I am off base here, so I'm coming to the real experts -- the users.
I installed a fresh new Oracle Audit connector. Connected it to two database instances. Right off the bat, I see threaddumps in the log directory. I send the complete logs off to tech support, along with the dumps.
I explain that, yes, the connector *appears* to be working, but I am concerned about the threaddumps --- to me, they indicate something has gone wrong. After getting a first wrong answer (I hate when I know more than rep about Arcsight internals than the rep), I get an ultimate reply telling me that the threa ddumps do not indicate a problem, and that unless I am asked by a tech to see them, they should be ignored.
This does not smell right to me at all. I do understand that there is such a thing as "debug" information that one throws into a log for troubleshooting purposes, but isn't a threaddump a crashlog --- i.e., something unexpected has happened? Even it it was a recoverable issue, isn't it something they should be looking at??? At minimum, isn't excessive threaddumping a performance hit?
I'm posting the (mostly) complete thread below. I'd appreciate your comments. If I am wrong, I'll accept that, but it just seems to me a connector that creates 3 thread dumps in the first minute it is alive is trying to tell me something.
I'm not posting the actual logs or dumps here, but if anyone would like to see them, they are available.
Note to AnyArcsight personnel: Case ID is 100902-000065.
After having multiple problems with our existing Oracle connector setup, I freshly installed a brand new connector.
However, even after installing the new connector, I am seeing frequent thread dumps. I am greatly concerned that there is still a fundamanetal unaddressed problem.
The connector setup is connect to two Oracle databases, on which is 9.2, the other is 10.2. The only override I have made in the properties file is to specifify "startatdate" parameter. In addition, I overrode the polling time (frequency), which seems oddly set to "5" seconds, per the documentation, to 1200. However, I am thinking that this may be a documentation error ... ie.e, the documentaition should read time in MINUTES, not seconds.
Beacuse of the production problems we are experinceing, I would appreciate an immediate examination here, as I intend to make an emergency change using this connector.
I've attatched the complete log folder and agent data folder, and agent.properties file
Thank you for attaching the log files. From review of the logs, I am seeing the following error -
[2010-09-02 10:24:11,763][WARN ][default.com.arcsight.agent.loadable.agent._OracleAuditTrailDatabaseAgent][processQuery()] Failed to process query [[select count(SESSION_ID) from dba_common_audit_trail]] on database URL[jdbc:oracle:thin:@172.29.4.181:1538:CGFSPRD], bitmechanic URL[jdbc:bitmechanic:pool:qiF60yoBABCACQaxQIyZtw==]
[2010-09-02 10:24:11,763][WARN ][default.com.arcsight.agent.loadable.agent._OracleAuditTrailDatabaseAgent][doDatabaseDetection] Tried version [10.x/11.x]. ERROR: [[ORA-00942: table or view does not exist
These errors are typically associated with the permissions on the account accessing the database. Can you verify that the DB user account has enough privilege to access both of the db's and pull data. I will follow up with call to you to verify.
<NAME>, there is no permissions issue. The connector does in fact
sucessfully pull data from both databases.
The "error" you are citing is expected. That is the result of the "version" check that the connector performs as it tries to determine tihe the database is Oracle version 9 or 10 or 11. It does so by first trying to perform a quesy that only succeeds in a "version 10" enviroment. If that fails, it knows it is now "version 10", and now tries a "version 9" query". When that suceeds, it now nows that the database is connectable, and is a "version 9" database. NOTE the phrase "Tried version 10.x/11.x" in the message.
In this connector, the CGFSCOP instance is a verion 9.2 database, CGFSPRD is a 10.2 database.
I think we need to look coloser at the thread dumps.
(Note: I apologized for my typos)
Upon further analysis of the log files, I am unable to identify any issues. The Thread dumps are generated whenever there is a +- 20% event rate change. They are not harmful and they are not indicating of any harm. They are for troubleshooting purpose. When you experience any serious issue, our DEV will request these threaddump file, otherwise, you can ignore them.
ArcSight Technical Support
.... I have not issues a reply.
Re: Threaddumping - am I overreacting?
The explanation provided in your conversation above seem plausible. Is there a reason you don't believe this to be the cause?
The "error" you are citing is expected. That is the result of the "version" check that the connector performs as it tries to determine tihe the database is Oracle version 9 or 10 or 11.
Re: Threaddumping - am I overreacting?
That comment about the "error" was mine. The rep was initally blaming thread dump on the "error", but in fact those thead dumps do not synchronize with the version check. I was the one explaing to the rep how the "error" actually corresponded to the version check.
After I pointed out how the version check worked, he backed out of that comment, and then basically tells me that threaddumps are not indicative of a potiental problem.
I get that they might indicate "recoverable" problems. But if you see them frequently, they tell me that something is wrong!