Well, the CSV and XML files are gziped. But I'm unable to recognize the format of the one containing the data, and gziping it divides it size by about 4-5.
So if it is compressed, it is really a light compression.
I'm interested in your script. Could you please share it?
I don't want to reinvent the wheel.
I've been to many customer sites where searching archives is slower than the query duration -- example. It takes longer than 1 day to search 1 day's worth of data.
Some of the DOD customers created a means of slurping logger archives into hadoop for elastic search. cool stuff.
but yeah, you are correct - archives are absolutely worthless at higher event rates. grep is faster.
Can you send me your shell script for archiving? I'm sluggin thru this now and find it confusing. I'm fairly new with Arcsight and have just read ESM 101, but still am confused. I get the 'jist' of it but all the details (like archiving) are getting me in a 'knot'. Anyways, I am just trying to move from Express to a LUN on our Enterprise network.
P.S. my email address at work; email@example.com
I am running into a similar issue with the Archive storage limit on our Express appliance. Could you share the shell script with me to move the archived zip files off to external storage?
We have Express and Logger installed about three months ago. I am getting to the point of setting up logger archive. But the description in the help page does not provide me a comfort feeling about archiving the event data.
Below is the excerpt from the logger help manual:
Once events that have been archived are deleted from Logger's local storage, they are not included in search operations. To include such events in search operations, you must load the archive in which those events exist back to the Logger. When an Event Archive is loaded, its events are included in searches, but the archive itself remains on the remote storage.
When events are archived, index information for those events is not archived. Therefore, when event archives are loaded, indices are not available. As a result, a search query that runs on archived events (that have been loaded on Logger) is slower than when the data was not archived because the index data for the archived data is not available.
My original thought was to start the schedule archive task to back up yesterday data to an external NFS mount. Then I will be safe and don't have to worry about losing data. I don't expect the archive operation would delete the original data. Or at least it will give user the option to keep the data.
Luckily, the first archive operation failed due to some other issue unknown at this point. But if it has been successful unluckily, guess what my feeling would be when I found out later that the original data has been deleted along with the archive process and running search and report on the archive data will take forever.
I like the product overall, but this archive "feature" is not a administrator friendly design. Hope someone in ArcSight has heard about this and will take some action to correct the situation soon.
So my conclusion so far is that it's more than just wasting time but will hurt you big time down the road if you are not aware of its behavior like my yeaterday me..:(
It really depends on your retention period. If you have a low event rate and can handle a year or more worth of data on local logger storage you can still archive everyday and search indexed data up to that year plus. If your event rate is high and you can only store say 60 days worth of events and you need to perform a search going back 8 months, then you run into a problem. Either way you should archive daily and this is pre CORR roll out.
If u are sending the same events to logger and express then I would just archive on express unless your looking for some redundancy.
Express keeps the index information so will be faster for searching than logger.
BTW be aware that archiving is enabled by default on express, so be sure to keep an eye on archive space usage.
Typically we have a script running on the express box that trims the local archives and at the same time copies to a remote share.
The mainly reason to archive the data off the logger is to prepare for the disastrous situation that the logger losts all it's data. It becomes not useful due to the way how achive works.
Thanks for your advise. I like your idea of only backing up Express event log. Do you know any info/data kept in the logger are missing in the Express/ESM logs? Basically, what I need to know is am I going to miss anything if only backing up the Express event logs.
I am working on archiving the event data on Express too.
If you are going to be sending the same events to both, then you wont miss anything.
Right now the only thing that you'll loose is the capability to have multiple archive locations (with therefore differing retention lengths) that the logger has.