User Behavioral Analytics (UBA) is a security process that observes the normal patterns of users’ behaviour and uses machine learning algorithms or statistical methods to detect deviations from normal patterns. The premise of UBA is that it is easier to steal user’s credentials than to actually mimic the user’s behaviour once the intruder has taken control over the system. In NetIQ Access Manager (NAM), UBA can be leveraged to help strengthen our risk-based authentication (RBA) by adding user behavioral analytics to the risk assessment. With UBA, we can use users’ behavioral data as another parameter, on top of the existing parameters like geolocation or cookie, etc, to assess the risk of the user.
With Vertica, an advanced SQL based in-database analytics solution, joining the long list of product suites being offered by Micro Focus, gives us a very strong hold in the users’ behavioral analytics domain. Through this cool solution, we intend to provide a solution offering UBA to the customers of NetIQ Access Manager (NAM), leveraging in-database advanced machine learning provided by Vertica on the user behavioral data generated by NAM, and also to elaborate on the major hurdles to clear to achieve NAM-Vertica integration, if someone wants to try some other use-cases.
The use-case that we are trying to demonstrate is to detect malicious behaviours of NAM users and to take an action based on that using RBA. We already know, how RBA strengthens the security by assessing the risk of a user using geolocation, cookie context, etc, and asks for either step-up authentication or takes some other action. In our use-case, we use users’ behavioral context in the form of a custom risk rule and try to assess the user’s risk. The user’s behavioral context are derived from parameters like login success count, login failure counts, number of distinct applications accessed count, weekday, weekend, business hours and non-business hours for every single day’s activity of the user. This context is derived by applying machine learning (ML) algorithms on these parameters, stored in SQL based database of Vertica, for each users, to generate an ML model. We, also implement and deploy a custom risk rule, which runs our test data against the model in Vertica, through SQL based query and classifies the attempt as malicious or benevolent and takes the necessary configured actions.
For example, a user John in NAM system, demonstrated a behaviour of having login counts of approximately 10 and 5 distinct applications accessed out of 40 he has access to, during non-business hours on a weekday over a period of 30 days. If an authentication attempt happens on his behalf, on a weekday and during non-business hours, aggregating his count of applications accessed to 50, this may be classified by the machine learning model as a malicious activity and he may be asked for a step up authentication or may be denied access based on the configured action.
The major high level steps to achieve the above said use-case or any other use-cases, leveraging Vertica with NAM can be categorized into several sub-steps:
Let’s delve deeper into the above said steps and see what we have to do for each of them.
To derive user’s behavioral context at Vertica, we need to configure NAM to generate and send audit events to Vertica. For our use-case, we needed 4 existing audit events, one can enable any audit events either at Identity Server (IDP) or Access Gateway (AG) for any other use-cases.
After enabling the audit events, we should configure the logging server which will receive our audit events via a syslog server. Here, we should specify the IP address of the server in which Kafka is deployed, and the port at which the syslog server is listening to (1290) in the same server. One can deploy Kafka either in one of the NAM nodes, or on another server. A separate node of Kafka should have syslog server deployed as well.
$template ForwardFormat,"%TIMESTAMP:::date-rfc3164% %HOSTNAME% %syslogtag:1:32%%msg:::sp-if-no-1st-sp%%msg%\n"
The above image is the content of nam.conf file, the template specifies the syslog message format in which the audit event will be sent, and the line “local0.* /var/log/NAM_audits.log;ForwardFormat” specifies the file to which the audit events will be dumped by syslog server from NAM and is the source of stream for Kafka.
Note: Also, if one doesn’t want to send events to a file and rather to an application listening to a port, from which Kafka will poll,then we can specify that line as “local0.* ipaddress:port;ForwardFormat”.
Kafka is a distributed streaming platform which acts as a message bus between Vertica and 3rd party applications. Data from NAM will be dumped to database of Vertica via Kafka. The Kafka cluster stores streams of records in categories called topics.
In our use-case, we created 2 Java based Kafka streaming applications using Streams API and deployed it on our Kafka node server. The first streaming application transforms the records from “testcustom” topic into an intermediate format which will be used to aggregate the data to create an ML model and split the transformed records into two branches/topics named “mlmodeltopic” and “evaltabletopic”, to create a training dataset and a test dataset for the ML model respectively.
While, the second streaming application, takes the data from “mlmodel” and “evaltabletopic” topic, aggregates the login success, failures count, and application accessed count, over a window of 24 hours, periodically and grouped by user, and pushes the final data aggregated over a day’s duration into the “finaltrainingmodeltopic” and “finalevaltabletopic” topic respectively. The data in these two topics will final be dumped into Vertica database.
Note: The number of streaming applications to implement and their functionality will be based on per use-case basis.
Once we launch these schedulers, data gets dumped into tables as and when data appears at Kafka topics.
For our use-case, since we had decided to divide our data into training and test dataset in topics named “finaltrainingmodeltopic” and “finalevaltabletopic”, we created 2 tables with the same name and with the same schema of data in these topics, and also created two schedulers.
Refer this for more details of setting up schedulers in Vertica -
Note: This is how the final workflow will look like.
The code snippet for communicating with Vertica will look something like this -
//connect to the Vertica DB
Connection conn = DriverManager.getConnection("jdbc:vertica://ip:5433/vertica", "dbadmin", "xxxxx");
// specify the Machine Learning Model's vertica SQL query here refer
Statement statement= conn.createStatement();
// execute the query and get results
ResultSet myResult = statement.executeQuery(kmeans_Query);
This solution is a starting initiative to showcase Vertica integration with NAM. Also, the machine learning algorithm that we have used for our use-case is just for demonstration, one can use any algorithm that suits their use-cases. Moving forward, we can add multiple other parameters to the same use-case such as classification of applications based on importance. Besides, this is just one use-case that we have demonstrated, UBA can be brought to use to use-cases like detecting brute-force attacks, detecting compromised credentials or profiles, detecting insider threats, detecting breach of classified and protected data, predicting the license usage of a product, etc.