Logger Archive Extractions

I've developed a simple script that allows one to export CEF events directly from archive files.

Brief disclaimer - this tool is officially not supported and not maintained.  I'm providing this script here in the hopes someone finds it useful.  If you make any improvements, please feel free to share them back with the community.  What follows is the README from the tarball, as it has some good examples and instructions.

lacat

This is a simple utility that exports CEF records from a Logger archive file. It prints them to stdout by design, allowing the user to redirect them to a file or pipe them into something else (grep, awk, whatever) for further manipulation.

Written in python (targetting 2.6.x) and using only the standard libraries that should be available on all RHEL installations, this should be fairly self contained.

Usage

$ ./lacat -h

Usage: lacat [options] path_to_dat path_to_meta

Extracts cef events from Logger Archive files to stdout

THIS SOFTWARE IS NOT SUPPORTED.  USE AT YOUR OWN RISK.

Why is it called lacat?

    Because "Logger_Archive_cat" was too long to type.


Options:

  -h, --help            show this help message and exit

  -j, --json            export as json instead of raw cef

  -f FILTER, --filter=FILTER

                        specify a key=val to filter records by. multiple -s

                        k=v allowed

The usage is hopefully quite straightforward and the implementation fast enough.  I'm still optimizing it a bit to squeeze a bit more performance so check back here for revisions.

Installation

Place the file lacat in your path and make the file executable:

chmod x lacat

Examples

Export raw CEF and capture in the file outfile.cef


./lacat ArcSight_Data_1_0504403158265495556.dat ArcSight_Metadata_1_504403158265495556.csv  > outfile.cef

Export all CEF records, one per line in JSON format, and capture in outfile.json

./lacat -j ArcSight_Data_1_0504403158265495556.dat ArcSight_Metadata_1_504403158265495556.csv  > outfile.json

Filter results by limiting output to destination IP 10.0.0.1

./lacat -f dst=10.0.0.1  ArcSight_Data_1_0504403158265495556.dat ArcSight_Metadata_1_504403158265495556.csv

Filter results by limiting output to destination IP 10.0.0.1 and UDP events only.

./lacat -f dst=10.0.0.1 -s proto=UDP  ArcSight_Data_1_0504403158265495556.dat ArcSight_Metadata_1_504403158265495556.csv

Notes

Multiple -s options can be specified to create an AND condition.  You can always specify -j to get each record output in JSON for ease of parsing with other languages.

Parents
  • Samuel - The archive files are not raw format, exactly...  At first, each chunk of events is output in cef but with special binary delimiters that connote a couple of different types of records.  Then, each chunk is gzipped independently and bundled together into a larger dat file and the metadata for each chunk is written to the csv file.  Each chunk is also padded with binary metadata.

    Karl - we're exchanging emails now trying to track down the issue.  My initial hunch is that the binary delimiter being used may be changed slightly with different Logger versions.

    If you're interested in seeing CEF turn to JSON, we should talk.  I have this working in my lab at the moment for an as of yet unannounced project (hopefully SOON) but I would be happy to give you code snippets and pointers to move in this direction.

    Could you do me a favor?

    Line 104 is

    if not r.startswith( cef_type):

        continue

    Could you change it to be

    if not r.startswith(cef_type):

       print hexlify(r[:60])

       continue

    And share that?  That will print out in hex the actual bytes of the first 60 characters of the record it thinks it's found and I can then compare that to the simple parser's criteria.

    I should probably add something like that as a debugging mode... 






Reply
  • Samuel - The archive files are not raw format, exactly...  At first, each chunk of events is output in cef but with special binary delimiters that connote a couple of different types of records.  Then, each chunk is gzipped independently and bundled together into a larger dat file and the metadata for each chunk is written to the csv file.  Each chunk is also padded with binary metadata.

    Karl - we're exchanging emails now trying to track down the issue.  My initial hunch is that the binary delimiter being used may be changed slightly with different Logger versions.

    If you're interested in seeing CEF turn to JSON, we should talk.  I have this working in my lab at the moment for an as of yet unannounced project (hopefully SOON) but I would be happy to give you code snippets and pointers to move in this direction.

    Could you do me a favor?

    Line 104 is

    if not r.startswith( cef_type):

        continue

    Could you change it to be

    if not r.startswith(cef_type):

       print hexlify(r[:60])

       continue

    And share that?  That will print out in hex the actual bytes of the first 60 characters of the record it thinks it's found and I can then compare that to the simple parser's criteria.

    I should probably add something like that as a debugging mode... 






Children
No Data