Sentinel Collector SDK 2014 Updates: Collector Performance Testing

Sentinel Collector SDK 2014 Updates: Collector Performance Testing

A recent thread (https://forums.netiq.com/showthread.php?52050) in the public forums, which you may have seen already, brings to light some new possibilities that I would like to cover today. So far in this series we've been talking about new offerings of the updated SDK and the code within; part of that development lifecycle would optimally include some kind of testing, and while we have given some coverage to unit testing, Sentinel and Log Manager are meant to handle event loads in many thousands of Events per Second (EPS), so performance testing also matters. The engineering team at NetIQ knows this, of course, and has given us some options in this area, so let's dive in.

If you read the collector documentation very often you may have stumbled upon some collection rate numbers in the past. The numbers that stick out in my head from years gone by are around 300 to 500 EPS for a single collector. This does not mean that a Collector Manager running multiple collector instances cannot go faster, nor does it mean that a connection cannot go faster, or even that a collector cannot exceed this rate; this simply means that generally speaking a single thread running a collector should be able to parse events somewhere in this range. Currently collectors are considered single-threaded, which imposes some limits on what a collector can do. I've heard rumors that may change, but until then the workaround for higher event rates is usually to split the load among multiple collector instances, or multiple Collector Manager (CM) instances.

I mention all of these details that may be documented because those limits are not necessarily imposed because of the collector's code, but may also be because of the overall infrastructure; multi-threading may help, but once a collector parses an event there is a lot more that must happen by the Collector Manager before the event is processed by the rest of Sentinel. Sending events along one ActiveMQ channel or another, performing caching to ensure that an unexpected outage does not lead to data loss, balancing system resources (CPU primarily) among multiple collectors or even other processes on the server, and other details can lead to collectors performing slower than one may expect. At the end of the day, the basics of more, faster, dedicated processors may be the only solution if you want to go faster. Still, how fast can a collector conceivably go on given hardware?

I mentioned in my article (https://www.netiq.com/communities/cool-solutions/sentinel-collector-sdk-2014-updates-debugger/) on the debugger that running events through the debugger can lead to event rates much faster than what could be done with the debugger in ESM. On my laptop I could get around 5,000 events per second through a collector just using the Eclipse interface to start that debugger, and this is probably two orders of magnitude faster than I could get events through the debugger in ESM (disclaimer: ESM implies my virtualized Sentinel environment and access across networks vs. all being on one quick system). What if the debugger was not imposing any overheard, though?

To show how this is done we're going to be spending some blissful time at the command line. You'll need to know where you extracted your SDK, and you'll need to know specifically where your collector development directory is, other than "right here on my screen in Eclipse under 'Collectors'" which isn't really a valid explanation as far as the command line is concerned. Chances are that it will be somewhere within your extracted SDK directory structure; if you have customized things so that it is elsewhere, I presume you can find that. If you really do not know where this is, try looking in your filesystem for one of the '2011.1' directories (specifically the one within a directory named after your collector's product), which should be the place to which we eventually want to browse within the shell.

Note: The terms shell, terminal, command line, and command line interface (CLI) are used interchangeably.

As a quick recap, the SDK has a couple of high-level directories within, namely 'current' (which holds the current SDK) and 'content' (which is where new content is generated by you). As has been the trend with this series, most of our time will be spent in 'content' because we are talking about performance testing of our new content. Start by browsing to that location in a shell:

#The '~' character refers to the current user's home directory, and will be used to abstract that for everybody following
#since that is likely where your SDK directory is, and is almost certainly where it should be.
cd ~/code/sentinel-plugin-sdk/content


From this location we need to browse down into the 2011.1 directory for our to-be-tested collector:

cd dev/collectors/ABSoftware/ABApplication/2011.1/


Within here we should see the files that are usually listed within Eclipse, and in fact if you wanted you could abandon Eclipse entirely and do everything from here. I'll have another article on using 'ant' for more than mere performance testing to make this work end-to-end, but more on that later.

ll   #or ls -l
total 72
-rw-r--r-- 1 ab users 335 Dec 23 14:21 build-local.xml
-rw-r--r-- 1 ab users 1958 Dec 23 14:21 build.xml
-rw-r--r-- 1 ab users 1756 Jan 8 08:34 connectionMethods.xml
-rw-r--r-- 1 ab users 258 Dec 23 14:21 controls.exclude
-rw-r--r-- 1 ab users 753 Dec 23 14:21 datasensitivity.map
-rw-r--r-- 1 ab users 429 Dec 23 14:21 deviceSupport.xml
drwxr-xr-x 1 ab users 30 Dec 24 09:23 docs
-rw-r--r-- 1 ab users 151 Dec 23 14:21 files.include
drwxr-xr-x 1 ab users 22 Dec 23 14:21 pack
drwxr-xr-x 1 ab users 114 Dec 23 14:21 parameters
-rw-r--r-- 1 ab users 568 Dec 23 14:21 plugin.properties
-rw-r--r-- 1 ab users 254 Dec 23 14:21 privlevel.map
-rw-r--r-- 1 ab users 377 Dec 23 14:21 protoEvt.map
-rw-r--r-- 1 ab users 1287 Dec 23 14:21 Rec2Evt.map
-rw-r--r-- 1 ab users 12775 Jan 8 23:38 release.js
-rw-r--r-- 1 ab users 81 Dec 23 14:21 release.properties
-rw-r--r-- 1 ab users 174 Dec 23 14:21 sqlquery.base
-rw-r--r-- 1 ab users 415 Jan 8 23:38 taxonomy.map


The only notable file in here that applies today is the 'build.xml' file. Reading this one doesn't matter a lot, but build.xml files are known in the IT world for being associated with 'ant', which is a build system for various things, especially Java-based things. Reviewing build.xml is outside the scope I am defining for this article, but if you do not see this file then you are in the wrong spot or your directory is somehow broken. This file is placed in the 2011.1 directory by default when creating a new collector, so missing is implies that things will not work. If this is the case, other tasks from with Eclipse for this collector should also fail, including any building, debugging, unit testing, etc. On the flip side, if it works in Eclipse, it should work from the CLI.

Now that you're in this directory, let's ensure your environment is setup correctly. As I mentioned, build.xml files are often associated with the 'ant' tool, so if you run 'ant' right now you should get something happening, notably a development build of your collector:

ant
Buildfile: /home/ab/code/sentinel-plugin-sdk/content/dev/collectors/ABSoftware/ABApplication/2011.1/build.xml
Trying to override old definition of task classloader

_buildTestPlugin:

_init:
[echo] /home/ab/apps/jdk1.7.0_67/jre/lib/ext:/usr/java/packages/lib/ext
[echo] Author information loaded from /home/ab/.netiq/pluginsdk/dev.properties
[echo] Using SDK located at /home/ab/code/sentinel-plugin-sdk/current/sdk
[mkdir] Created dir: /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test/plugin
[mkdir] Created dir: /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test/tmp

*snip for brevity*

[delete] Deleting: /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test/tmp/htmlDoc/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test.html.tmp
[echo] Individual HTML fragements substitution in skeleton doc complete
[echo] HTML Document generated successfully
[copy] Copying 1 file to /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test/plugin/docs
[move] Moving 1 file to /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test
[echo] HTML Document generated at : /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test.html
[echo] ABSoftware_ABApplication_2011.1r1-201501160900-internal-test.html should now be available in the build directory
[xmltask] Setting standalone

_package-plugin:

_prepackage-plugin:
[xmltask] Setting standalone
[zip] Building zip: /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test/ABSoftware_ABApplication_2011.1r1-201501160900-internal-test.clz.zip

Build Development Plug-in:

BUILD SUCCESSFUL
Total time: 3 seconds


For those of you following along, the output should resemble what you see in Eclipse when doing some kind of build, in particular a Development (vs. Production) build. The end MUST return 'BUILD SUCCESSFUL' or nothing we try to do going forward makes any sense since performance testing a failed build is an exercise in futility. If 'ant' does not run, or tosses a brief error about 'command not found' or equivalent, then perhaps your environment is not setup properly. On openSUSE or SLES this can be fixed pretty easily:

sudo zypper in ant


On other platforms, you'll need to find an installer that will install 'ant' for your user or system, though since Eclipse apparently works to build the collector you may just need to modify your user's PATH to point to the directories where Java and/or ant reside. Describing that is probably well outside the scope of this article, so I'll just leave you with this to hopefully help: http://ant.apache.org/manual/install.html

At this point you should be able to get the output shown above when running 'ant'. Another command to try is 'ant -p' which will show all available ant "targets" (things ant can do with this build.xml) for your project:

ant -p
Buildfile: /home/ab/code/sentinel-plugin-sdk/content/dev/collectors/ABSoftware/ABApplication/2011.1/build.xml
Trying to override old definition of task classloader

Main targets:

Build Development Plug-in Just creates the plugin ZIP; no docs or release
Build Production Plug-in This task builds the current ${plugintype} into an importable plugin ZIP file
Convert to 2011.1 Converts a plugin that uses template version 6.1 to the 2011.1 template
Run Collector Performance Test Runs a performance test on this collector with a given connector dump (defaults to the variety dump in case unspecified).
Run Development Tests Run automated tests against the built plug-in
Run Production Tests Run automated tests against the built production plug-in
Run Single Test Run a single automated test against the built plug-in
RunCollectorDebug Runs the local collector in the debugger
help Provides some helpful instructions
Default target: Build Development Plug-in


For those not familiar with ant, using the -p option to see available targets is a common option and can be used for any old project. There is a lot more here than "Build', including our topic today, 'Run Collector Performance Test' which is really exciting. We'll get into the others in a later article, as mentioned before.

At this point you do not really need to do anything else other than prepare your system to run a lot of tests for a while. This isn't a ten-second task, and just as with any performance test, any other work happening on the system could impact results. Watching a movie while doing this, or doing this with a bunch of collectors at once, or anything else on the system, will impact results, if only a little. Still, let's get it started:

#Note that this is a computer, and as is always the default, case (meaning case sensitivity) matters so make sure it is exact.
#Use quotes so that spaces are preserved as well and the parameter is sent as a single argument
ant 'Run Collector Performance Test'


The first thing you should see is a bunch of output as the collector is built based on current code. Eventually you'll get to this point:

_prepackage-plugin:
[xmltask] Setting standalone
[zip] Building zip: /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160946-internal-test/ABSoftware_ABApplication_2011.1r1-201501160946-internal-test.clz.zip

_runPerformanceTest:
[input] Enter Input File: [../tests/data/variety.dump]


This is now asking you for an input file. If you have the variety.dump (a connector dump) file setup in the '../tests/data' for the debugger or some other purpose, then that's sufficient; just press [Enter] to accept that default. A few more prompts will then come up:

  [input] Enter Cached Input Lines: [4000000]
[input] Enter Processed Line Count: [2000000]
[input] Enter rate of logging EPS (in seconds): [60]


I accepted the defaults for each of these, just hitting [Enter] each time. You are welcome to adjust them, but they are set as they are for a reason. The number of lines processed is chosen so that some kind of consistency can be witnessed in the parsing of events, basically avoiding too few data points, or hard-to-explain jumps and dips in performance for whatever reason. The next thing you should observe is your CPU fan spinning up. Be sure things are well-ventilated as you are doing performance testing, which implies (in the case of a collector) that the CPU s doing a lot of work. The output starts out like this:

  [echo] Pausing execution momentarily before starting Performance Test..
[echo] Running Collector Performance Test...
[echo] input = ../tests/data/variety.dump
[echo] Processed Line Count = 2000000
[echo] Cached input Lines = 4000000
[echo] Rate of logging EPS = 60 seconds
[echo] Logging EPS to = /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160951-internal-test/realtimeEPS.out


One thing to note from this is that there is a log file that will show the EPS rate as it is calculated. If you want to know if your test is working, and how things are performing, just tail that file from another shell:

tail -f /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160951-internal-test/realtimeEPS.out


or use 'less' for the same purpose but with more features available than 'tail' (Ctrl+c to stop following):

less +F /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160951-internal-test/realtimeEPS.out


Eventually the following will show up automatically:

  [echo] Jan 16, 2015 9:51:42 AM esecurity.base.configuration.MetaDataManager init
[echo] INFO: Initializing metadata, container name null
[echo] Jan 16, 2015 9:51:51 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.Main main
[echo] INFO: Running performance test for collector in directory /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160951-internal-test/plugin with input ../tests/data/variety.dump, output ../tests/data/2011.1r1-201501160951-internal-test-performance.events for 2,000,000 iterations.
[echo] Jan 16, 2015 9:51:51 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 0.0 | Cumulative EPS : NaN | Input Count : 0 | Event Count : 0 | New Events Processed : 0
[echo] Jan 16, 2015 9:51:51 AM com.netiq.sentinel.collectorutil.CollectorTopLevel init
[echo] INFO: Rhino initialized, implementation version: Rhino 1.7 release 4 2012 06 18.
[echo] Jan 16, 2015 9:51:51 AM com.netiq.sentinel.collectorutil.CollectorTopLevel init
[echo] INFO: Rhino language version: 0.
[echo] Jan 16, 2015 9:51:51 AM esecurity.ccs.comp.evtsrcmgt.collector.util.BSFEngine runRhino
[echo] INFO: Context defined and objects correctly loaded into namespace.
[echo] Jan 16, 2015 9:51:51 AM esecurity.ccs.comp.evtsrcmgt.collector.util.BSFEngine runRhino
[echo] INFO: Rhino optimization level: 9.
[echo] Jan 16, 2015 9:51:51 AM esecurity.ccs.comp.evtsrcmgt.collector.util.JavaRegExp extendNativeRegExp
[echo] INFO: Installing Java regular expression extensions
[echo] Jan 16, 2015 9:51:57 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 9399.833333333334 | Cumulative EPS : 11119.8738170347 | Input Count : 56401 | Event Count : 56399 | New Events Processed : 56399
[echo] Jan 16, 2015 9:52:03 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15555.833333333334 | Cumulative EPS : 13522.441975977603 | Input Count : 149736 | Event Count : 149734 | New Events Processed : 93335
[echo] Jan 16, 2015 9:52:09 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15524.833333333334 | Cumulative EPS : 14225.31334192339 | Input Count : 242885 | Event Count : 242883 | New Events Processed : 93149
[echo] Jan 16, 2015 9:52:15 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15464.333333333334 | Cumulative EPS : 14546.868905742145 | Input Count : 335671 | Event Count : 335669 | New Events Processed : 92786
[echo] Jan 16, 2015 9:52:21 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15552.5 | Cumulative EPS : 14753.920759389186 | Input Count : 428986 | Event Count : 428984 | New Events Processed : 93315
[echo] Jan 16, 2015 9:52:27 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15405.0 | Cumulative EPS : 14864.868717393165 | Input Count : 521416 | Event Count : 521414 | New Events Processed : 92430
[echo] Jan 16, 2015 9:52:33 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15498.333333333334 | Cumulative EPS : 14957.032961682651 | Input Count : 614406 | Event Count : 614404 | New Events Processed : 92990
[echo] Jan 16, 2015 9:52:39 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15361.666666666666 | Cumulative EPS : 15008.283948257185 | Input Count : 706576 | Event Count : 706574 | New Events Processed : 92170
[echo] Jan 16, 2015 9:52:45 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15246.833333333334 | Cumulative EPS : 15034.682843201897 | Input Count : 798057 | Event Count : 798055 | New Events Processed : 91481
[echo] Jan 16, 2015 9:52:51 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 14990.5 | Cumulative EPS : 15030.178906924391 | Input Count : 888000 | Event Count : 887998 | New Events Processed : 89943
[echo] Jan 16, 2015 9:52:57 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15465.666666666666 | Cumulative EPS : 15070.09618634953 | Input Count : 980794 | Event Count : 980792 | New Events Processed : 92794
[echo] Jan 16, 2015 9:53:03 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 14545.166666666666 | Cumulative EPS : 15025.364357661358 | Input Count : 1068065 | Event Count : 1068063 | New Events Processed : 87271
[echo] Jan 16, 2015 9:53:09 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15306.5 | Cumulative EPS : 15047.064928325875 | Input Count : 1159904 | Event Count : 1159902 | New Events Processed : 91839
[echo] Jan 16, 2015 9:53:15 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15023.666666666666 | Cumulative EPS : 15045.182100474207 | Input Count : 1250046 | Event Count : 1250044 | New Events Processed : 90142
[echo] Jan 16, 2015 9:53:21 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 14674.333333333334 | Cumulative EPS : 15020.036593442364 | Input Count : 1338091 | Event Count : 1338090 | New Events Processed : 88046
[echo] Jan 16, 2015 9:53:27 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15404.5 | Cumulative EPS : 15044.13806158506 | Input Count : 1430519 | Event Count : 1430517 | New Events Processed : 92427
[echo] Jan 16, 2015 9:53:33 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15231.666666666666 | Cumulative EPS : 15055.119745966425 | Input Count : 1521909 | Event Count : 1521907 | New Events Processed : 91390
[echo] Jan 16, 2015 9:53:39 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 14928.833333333334 | Cumulative EPS : 15047.90363245868 | Input Count : 1611482 | Event Count : 1611480 | New Events Processed : 89573
[echo] Jan 16, 2015 9:53:45 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 14677.333333333334 | Cumulative EPS : 15028.118948457437 | Input Count : 1699546 | Event Count : 1699544 | New Events Processed : 88064
[echo] Jan 16, 2015 9:53:51 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15266.0 | Cumulative EPS : 15039.9690995197 | Input Count : 1791142 | Event Count : 1791140 | New Events Processed : 91596
[echo] Jan 16, 2015 9:53:57 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 14638.333333333334 | Cumulative EPS : 15020.344538151005 | Input Count : 1878971 | Event Count : 1878970 | New Events Processed : 87830
[echo] Jan 16, 2015 9:54:03 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 15410.5 | Cumulative EPS : 15038.094221028863 | Input Count : 1971435 | Event Count : 1971433 | New Events Processed : 92463
[echo] Jan 16, 2015 9:54:05 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$CollectorRunnable$1 run
[echo] INFO: Done reading data, terminating script engine...
[echo] Jan 16, 2015 9:54:09 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger run
[echo] INFO: Instantaneous EPS for this interval: 4761.166666666667 | Cumulative EPS : 15039.29014550513 | Input Count : 2000000 | Event Count : 2000000 | New Events Processed : 28567
[echo] Jan 16, 2015 9:54:10 AM esecurity.ccs.comp.evtsrcmgt.collector.util.testbed.CollectorRunner$EPSLogger stop
[echo] INFO: Terminated
[echo]
[echo] COLLECTOR_NAME = ABSoftware ABApplication
[echo] DATE = Fri Jan 16 09:54:10 MST 2015
[echo] EVENTS_PER_SECOND = 15039.29014550513
[echo] EVENT_COUNT = 2000000
[echo] EVENT_COUNT_OBSERVER_CHANNEL = 2000000
[echo] EVENT_COUNT_UNSUPPORTED = 0
[echo] EVENT_NAME_UNIQUE_COUNT = 7
[echo] EXPLOIT_DETECT_DATA_PROVIDED_COUNT = 0
[echo] INFO_COUNT = 0
[echo] INPUT_COUNT = 2000000
[echo] INPUT_METHODS = SYSLOG(map)
[echo] OFFSET_COUNT = 0
[echo] TX_COUNT = 0
[echo] XDAS_ACCOUNT_MANAGEMENT_COUNT = 1400000
[echo] XDAS_SVCAPP_MANAGEMENT_COUNT = 400000
[echo] XDAS_SVCAPP_UTIL_COUNT = 200000
[echo]
[echo] Output Statistics file /home/ab/code/sentinel-plugin-sdk/content/build/Collector/ABSoftware_ABApplication_2011.1r1-201501160951-internal-test/ABSoftware ABApplication-performanceStatistics.txt

Run Collector Performance Test:

BUILD SUCCESSFUL
Total time: 2 minutes 54 seconds


A file is created showing statistics, the status of the build (performance test in this case) is shown, and a total time is shown. Going back a bit more, we see some statistics which are in that statistics file (so no need to open it again unless you are doing something automated via a script of your own perhaps) showing the event rate, event count, etc. If you are somebody who just wants one number to take away from this entire thing, that number is 15039 and is the number of events per second that the collector could handle in its performance testing vacuum on the system where you are running the performance tests. Those who want more-than one number may like seeing the details above where every few seconds the number of events processed since the previous reporting time are shown, whichgives some idea about variability of performance over time. It may be nice to have some basic statistics (median, mode (if applicable), range, standard deviation, etc.) generated automatically, but that is not yet there (I'll see if I can somehow make those happen and submit an update), though using the numbers above you can generate those on your own.

If you have done performance testing in the past, a few questions have likely come to mind. These may include (please submit comments on others):


  1. What if I used different hardware?

  2. How much did other things on the system impact these numbers?

  3. What if I think that one type of event is parsed more-quickly than another?

  4. How do I compare this number with the collector of my coworker down the hall?

  5. How do these numbers compare with NetIQ's collectors?



The answers to these, as far as I know, are the following:


  1. Different results will come out. This is not a test, in its current form, that will generate some sort of standardized number to compare across systems.

  2. Potentially a lot. Try running a dozen of these at a time and you'll see that very quickly.

  3. Performing tests on certain types of events to find a bottleneck is probably a good idea, especially if some events vary significantly from others in ways known to perform better or worse than those others.

  4. Use the same test machine, in the same state, for all tests. Copying over the collector development directory from one system to another is trivial, so do that for best results.

  5. NetIQ does not publish those numbers, probably (though I do not know) for some of the reasons mentioned above. The numbers, in a vacuum, do not mean anything.



The biggest benefit from this performance testing is that you can see what a collector can do on your system. If you develop a lot of collectors and use a consistent system for testing you can get an idea of how well or poorly a collector performs relative to other collectors tested on the same system in the same state. If you get better results on one than another, you can look more-closely at the slower one and see if there are places to improve. The biggest drawback of this, in my opinion, is that the number 15,039 EPS does not mean anything when the collector is moved into Sentinel, or when the testing happens on your machine vs. mine, or for event X instead of event Y. Why does a more-complex collector of mine, handling similar data in a test on the same system, reliably get around 25,000 EPS? Because we know how to do these test with 'ant' now it may make sense to have different input files with only one type of event and then a script to try them all and print out results for each so that speed of each type of event can be ascertained automatically. It's a trivial exercise, but it's an exercise left to the reader.

I have tried to come up with some ways to provide semi-standardized numbers for this process, but this is a work in progress. If you want to continue, read on, but know that this is experimental and venturing off into other geeky areas of Linux. Within Linux (the kernel) there is the idea of a cgroup (control group) which is used for a lot of things through Linux, from control of arbitrary processes to implementation of containers in LXC-land. The reason I bring them up is that statistics on cgroups can be gathered fairly easily. These statistics can be about processes within a given cgroup, meaning you can get an idea of how much work (processor time) actually went into a given process or group of processes, regardless of the rest of the system. Setting this up is not hard, but, unlike most things with the SDK, does require root privileges as we are telling the kernel to do things. Those interested in a few more details about cgroups may find this article interesting: https://www.kernel.org/doc/Documentation/cgroups/cpuacct.txt

In summary, we're going to setup a cgroup to monitor the current shell (and implied, any sub-processes) from which we will call 'ant' to do our performance tests; we'll also show the slightly-better approach to monitor ant itself, but we will get to that later. The cgroup file we check will return a number which will be in USER_HZ which is basically the number of jiffies during the time of the process monitoring. From there we can convert from USER_HZ to seconds of actual processor time used by the process, and then calculate a rough estimate of cycles used during that time (processor speed per time * time working on something should give hertz (HZ) spent on something). Dividing that number of Hertz by the number of events processed in that time period, we have a number of cycles/ (Hertz) for a event to be parsed. With the theory out there to be critiqued, corrected, etc., here is how you do it.

First, your shell has a process ID; find it using this simple command:

echo $$
5728


We can use this, specifically if this shell is the one from which you'll call the performance tests, but there is another option that may be a little simpler, and is certainly a bit more accurate assuming you only have one performance test running at a time. One way or another, we are going to give a new cgroup a list of processes/threads to monitor for us, whether that's using the shell that launches the test (implying all child processes too or else this would not make sense) or the individual processes/thread later.

First, let's create our new cgroup; we'll add a "directory" within the 'cpuset' and 'cpuacct' sections of /sys/fs/cgroup and name that 'sent-sdk' for simplicify.

sudo mkdir /sys/fs/cgroup/cpuset/sent-sdk
sudo mkdir /sys/fs/cgroup/cpuacct/sent-sdk


Next we'll pin anything in this group to a certain processor, specifically processor zero. This is mostly so we only need to figure out the speed of this one processor, as compared to figuring out the speed of multiple processors where the test MAY be running and then hoping for the best; it is also partially because pining the process to a single processor should reduce fluctuation in the CPU's levels of work as there is no reason for a CPU to get a break as the process is moved to another available processor.

#0 is the processor to be used, vs 1, 2, 6, 7, etc.
echo 0 | sudo tee /sys/fs/cgroup/cpuset/sent-sdk/cpuset.cpus


If not done already, start the ant process; if you are going to link the shell itself to a cgroup, do so before starting ant or child processes will not be automatically put into the group. If you want to just add relevant child processes, you can do that at any time before the real performance work gets fully started.

for i in $(pgrep -wf 'Run Collector Performance Test'); do echo ${i} | sudo tee /sys/fs/cgroup/cpuset/sent-sdk/tasks; done
for i in $(pgrep -wf 'Run Collector Performance Test'); do echo ${i} | sudo tee /sys/fs/cgroup/cpuacct/sent-sdk/tasks; done


These commands basically look for processes with 'Run Collector Performance Test' in their command line and then write them each to the appropriate 'tasks' file, making the cgroup use them. An interesting quirk of cgroups is that you can only assign one "task" at a time; these are pseudo-files, after all, and have some real limitations if you're used to the command line otherwise as a result. Some of those limitations are very good, though maybe unintuitive.

Next we'll go into the directory where the stats will be accumulated:

cd /sys/fs/cgroup/cpuacct/sent-sdk/


Be sure that the test runs at this point, either finally getting past its delayed start or by manually being started pressing [enter] one last time. While running the test, we need to see how fast the processor is actually running, which can be seen from /proc/cpuinfo for each processor. Since we're pinning the processes to the first processor, we'll look for that processor's average over a period of time. Unfortunately, this probably requires manual work unless the start of the following loop can be timed with a time the processes are known to be running the test, both showing numbers after the test starts, as well as stopping before the test ends, which depends on the speed of collector, number of events sent, etc. Just to give it my best shot, I'll try to make it completely automated by delaying twenty seconds (about how long the ant build delays before starting the real performance part of the test after the last user input is provided) and even calculate the average over twenty seconds of testing:

sleep 20  #the performance test appears to be delayed twenty seconds after the last [enter] providing requested data.
for i in {0..19}; do grep 'cpu MHz' /proc/cpuinfo | head -1 | awk '{printf "%s+", $NF}' && sleep 1; done | sed -e 's/^/(/' -e "s/[+]$/) \/ 20\n/" | bc -l


Once the processing is complete, retrieve stats from the appropriate cpuacct.stat file:

cat /sys/fs/cgroup/cpuacct/sent-sdk/cpuacct.stat


Add the two numbers (user and system) up and divide by 100

echo '('$(awk '{printf "%s+", $2}' /sys/fs/cgroup/cpuacct/sent-sdk/cpuacct.stat)0')/100' | bc -l


This gives us the number of seconds of CPU time used by taking the number of jiffies (both user and system) and dividing by 100 jiffies per second, which is the normal number but can be confirmed using python or something like that as shown below:

python
>>> import os
>>> os.sysconf(os.sysconf_names['SC_CLK_TCK'])
100


Finally multiple this number by the average speed of the CPU (CPU 0) during the test as retrieved earlier;

Divide the number of input events, and possibly output events too separately for another data point, by the product of the CPU speed (previous step) and the time the process ran (step before that).

The full equation, for those oriented that way, is:
Number of events used for testing divided by the product of the average CPU speed of CPU zero (0) during the performance test multiplied by the number of CPU-seconds (jiffies / 100) consumed by the performance test. If I processed 2,000,000 (two million) events and it took 18000 USER_HZ (jiffies), or about three minutes, and my processor was averaging 2600 MHZ, then I would do the following calculation:
2000000 / ((18000/100) * 2600) == 4.27 million cycles per event.

Based on this we loosely have some kind of standard way to identify how well a collector performs on a given set of events. This number should be more comparable between systems and collectors than another because it removes errors brought in by a wall clock used to measure things (which does not factor out other things taking up the CPU as well as this calculation does), removes CPU speed from the equation, and in the end has absolute measures of cycles per event parsed. The biggest drawback that I can see is in the rough averaging of CPU speed in MHz over part of the testing time period. Since an average is generated based on samples taken at one-second intervals this is not incredibly precise, though since the processes are pinned to that CPU hopefully the CPU stays as busy as possible to ensure that the number calculated has a low range of input samples, which would possibly be a bigger problem if the OS was allowed to migrate the threads involved to other processors/cores via standard scheduling. Another potential drawback with this standard of measurement is that it is platform-dependent, though any Sentinel customers doing development should be able to easily copy the SDK over to a Sentinel system and run the tests there in a similar way, even if developing on an alternative platform like Mac OS X instead of openSUSE or SUSE Linux Enterprise.

Even if the extra steps to arrive at a CPU-leveled number are not used, within a given environment numbers can be somewhat compared with others from the same system running tests in the same way. The biggest limitation with this, as it is now, is that comparing my results with yours is almost certainly meaningless, so there is no other reliable way to have a good benchmark to which we can all aspire. There may be better ways of doing performance testing, or better ECMAScript engines, or more-absolute measures of performance in the future, but for now this is all I can conjure.

Any input from others is appreciated; have you tried coming up with a more-standardized way of testing? What kind of numbers do you get from your current system? If you have done semi-objective performance testing in other areas of IT, what tools can be used, preferably in a cross-platform way? Any other tips/tricks in the area of performance worth noting with ECMAScript?

Happy computing!

DISCLAIMER:

Some content on Community Tips & Information pages is not officially supported by Micro Focus. Please refer to our Terms of Use for more detail.
Top Contributors
Version history
Revision #:
1 of 1
Last update:
‎2015-02-04 20:29
Updated by:
 
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.