Leverage Vertica to offer UBA solutions using NetIQ Access Manager
User Behavioral Analytics (UBA) is a security process that observes the normal patterns of users’ behaviour and uses machine learning algorithms or statistical methods to detect deviations from normal patterns. The premise of UBA is that it is easier to steal user’s credentials than to actually mimic the user’s behaviour once the intruder has taken control over the system. In NetIQ Access Manager (NAM), UBA can be leveraged to help strengthen our risk-based authentication (RBA) by adding user behavioral analytics to the risk assessment. With UBA, we can use users’ behavioral data as another parameter, on top of the existing parameters like geolocation or cookie, etc, to assess the risk of the user.
With Vertica, an advanced SQL based in-database analytics solution, joining the long list of product suites being offered by Micro Focus, gives us a very strong hold in the users’ behavioral analytics domain. Through this cool solution, we intend to provide a solution offering UBA to the customers of NetIQ Access Manager (NAM), leveraging in-database advanced machine learning provided by Vertica on the user behavioral data generated by NAM, and also to elaborate on the major hurdles to clear to achieve NAM-Vertica integration, if someone wants to try some other use-cases.
The use-case that we are trying to demonstrate is to detect malicious behaviours of NAM users and to take an action based on that using RBA. We already know, how RBA strengthens the security by assessing the risk of a user using geolocation, cookie context, etc, and asks for either step-up authentication or takes some other action. In our use-case, we use users’ behavioral context in the form of a custom risk rule and try to assess the user’s risk. The user’s behavioral context are derived from parameters like login success count, login failure counts, number of distinct applications accessed count, weekday, weekend, business hours and non-business hours for every single day’s activity of the user. This context is derived by applying machine learning (ML) algorithms on these parameters, stored in SQL based database of Vertica, for each users, to generate an ML model. We, also implement and deploy a custom risk rule, which runs our test data against the model in Vertica, through SQL based query and classifies the attempt as malicious or benevolent and takes the necessary configured actions.
For example, a user John in NAM system, demonstrated a behaviour of having login counts of approximately 10 and 5 distinct applications accessed out of 40 he has access to, during non-business hours on a weekday over a period of 30 days. If an authentication attempt happens on his behalf, on a weekday and during non-business hours, aggregating his count of applications accessed to 50, this may be classified by the machine learning model as a malicious activity and he may be asked for a step up authentication or may be denied access based on the configured action.
The major high level steps to achieve the above said use-case or any other use-cases, leveraging Vertica with NAM can be categorized into several sub-steps:
- Configure NAM to generate relevant events that provide sufficient data to derive behavioral context of users by generating machine learning model at Vertica.
- Deploy Apache Kafka on your NAM server or another server - Kafka is used as an intermediary message queue between NAM and Vertica.
- Deploy our custom implemented Apache Kafka Streaming and Source Connector applications on the server where Kafka is deployed.
- Have a standalone Vertica setup ready, and do the necessary configurations at Vertica to launch a data streaming job scheduler.
- Implement a custom rule class and deploy it at NAM server.
Let’s delve deeper into the above said steps and see what we have to do for each of them.
1. Configure NAM to generate relevant events
To derive user’s behavioral context at Vertica, we need to configure NAM to generate and send audit events to Vertica. For our use-case, we needed 4 existing audit events, one can enable any audit events either at Identity Server (IDP) or Access Gateway (AG) for any other use-cases.
- Click Devices > Identity Server > Servers > Edit > Auditing and Logging.
- In the Audit Logging section, select Enabled.
- Check “Login Consumed”, “Login Consumed Failure” and “Federation Request Sent”, and apply the changes and update IDP.
- Click Devices > Access Gateways > Edit > Auditing.
- Check “Application Accessed”, and apply changes and update Access Gateway.
After enabling the audit events, we should configure the logging server which will receive our audit events via a syslog server. Here, we should specify the IP address of the server in which Kafka is deployed, and the port at which the syslog server is listening to (1290) in the same server. One can deploy Kafka either in one of the NAM nodes, or on another server. A separate node of Kafka should have syslog server deployed as well.
- To specify the logging server, click Auditing.
- Select Syslog and from drop-down list, select Send to Third party, specify the IP address and port. Apply the changes and update IDP and AG if necessary from admin console.
- SSH into your NAM setup, edit /etc/Auditlogging.cfg file and set “FORMAT” to “CSV”.
- SSH into your NAM setup, edit /etc/rsyslog.d/nam.conf file and add these lines, if they are not present in the file -
$template ForwardFormat,"%TIMESTAMP:::date-rfc3164% %HOSTNAME% %syslogtag:1:32%%msg:::sp-if-no-1st-sp%%msg%\n"
The above image is the content of nam.conf file, the template specifies the syslog message format in which the audit event will be sent, and the line “local0.* /var/log/NAM_audits.log;ForwardFormat” specifies the file to which the audit events will be dumped by syslog server from NAM and is the source of stream for Kafka.
Note: Also, if one doesn’t want to send events to a file and rather to an application listening to a port, from which Kafka will poll,then we can specify that line as “local0.* ipaddress:port;ForwardFormat”.
- After editing the file, restart syslog services using “rcsyslog restart” and also restart IDP and AG.
2. Deploy Apache Kafka
Kafka is a distributed streaming platform which acts as a message bus between Vertica and 3rd party applications. Data from NAM will be dumped to database of Vertica via Kafka. The Kafka cluster stores streams of records in categories called topics.
- Install the binaries of Kafka either in your NAM node or another node.
- Start the zookeeper server using the zookeeper start script. ZooKeeper is a centralized service provided by Apache for maintaining configuration information, naming, providing distributed synchronization, and providing group services for Kafka clusters and topics.
- Start the Kafka server using kafka start script provided by the binaries of Kafka.
- Create the topics as per your use-cases using topic-create script provided by binaries of Kafka. For our use-case we created 5 topics.
3. Deploy custom Apache Kafka Streaming and Source Connector applications
- Kafka facilitates receiving of stream of records to its cluster from external 3rd party applications using a “Source Connector”. There are several third party open sourced source connectors for Kafka, or we can also implement our own custom source connectors using Connector API provided by Kafka. These source connectors are standalone Java or Scala based applications which is run on the Kafka server, and continuously poll for data from external application using the data polling logic and the data format processing logging implemented. For our use-case we created a custom source connector that keeps polling for data from the file in which NAM audit events were dumped in the previous step, processes the syslog message into a JSON format which Kafka can understand and sends it to a topic named “testcustom“.
- Kafka also facilitates streaming and processing of streams of records within its cluster and it helps us do that through Streams API with the help of which we can create Java or Scala based streaming applications to transform the data that we receive from 3rd party application (NAM) to a format on which we can apply our business logic (in our case, transform the data into a format which can be dumped into Vertica database and on which ML algorithms can be applied). Streams API provides us with many operations to apply on data like filter, map, groupBy, to name a few.
In our use-case, we created 2 Java based Kafka streaming applications using Streams API and deployed it on our Kafka node server. The first streaming application transforms the records from “testcustom” topic into an intermediate format which will be used to aggregate the data to create an ML model and split the transformed records into two branches/topics named “mlmodeltopic” and “evaltabletopic”, to create a training dataset and a test dataset for the ML model respectively.
While, the second streaming application, takes the data from “mlmodel” and “evaltabletopic” topic, aggregates the login success, failures count, and application accessed count, over a window of 24 hours, periodically and grouped by user, and pushes the final data aggregated over a day’s duration into the “finaltrainingmodeltopic” and “finalevaltabletopic” topic respectively. The data in these two topics will final be dumped into Vertica database.
Note: The number of streaming applications to implement and their functionality will be based on per use-case basis.
4. Launch a data streaming job scheduler at Vertica.
- We have to create as many tables in the database , as the number of topics in Kafka that we have to read data from. The table names must be same as the topic names in Kafka, and the schema of the tables as well as the field names in the tables also has to be same as the schema of record in each topic as well as the fields of each record in the topic.
- Data streaming job scheduler is a plugin used by Vertica to periodically stream data from Kafka’s topic to the respective target database table in Vertica. Using vkconfig tool of Vertica we can create a job scheduler and configure it to periodically stream records from a specific Kafka cluster and a topic. To read from multiple topics in a Kafka cluster, we need to create as many schedulers. While creating a scheduler for a topic, we link the topic with the tables in the database of Vertica, created in the first step, in which we want to put the data.
Once we launch these schedulers, data gets dumped into tables as and when data appears at Kafka topics.
For our use-case, since we had decided to divide our data into training and test dataset in topics named “finaltrainingmodeltopic” and “finalevaltabletopic”, we created 2 tables with the same name and with the same schema of data in these topics, and also created two schedulers.
Refer this for more details of setting up schedulers in Vertica -
Note: This is how the final workflow will look like.
5. Deploy custom rule jar in NAM server.
- We create a custom rule class as per stated in NAM documentation, https://www.netiq.com/documentation/access-manager-44/nacm_enu/?page=/documentation/access-manager-44/nacm_enu/data/rba-custom-rule-sdk.html
The code snippet for communicating with Vertica will look something like this -
//connect to the Vertica DB
Connection conn = DriverManager.getConnection("jdbc:vertica://ip:5433/vertica", "dbadmin", "xxxxx");
// specify the Machine Learning Model's vertica SQL query here refer
Statement statement= conn.createStatement();
// execute the query and get results
ResultSet myResult = statement.executeQuery(kmeans_Query);
- We configure a risk policy in admin console with our custom rule specifying the class name.
- In this custom class we create connections to Vertica database using JDBC and the drivers for the same can be found at - https://my.vertica.com/download/vertica/client-drivers/
- Based on the business logic, the ML model is created in Vertica database and the test data is run against the model in the database via usual SQL queries through JDBC connections, in this custom class only.
- There is a list of different ML algorithms that Vertica support in its database and that can be used using normal SQL queries. We are using k-Means clustering machine learning algorithm to cluster data into groups based on their similarity. K-means clustering is an unsupervised learning algorithm that clusters data into groups based on their similarity. Using k-means, you can find k clusters of data, represented by centroids. To know more about the k-Means cluster algorithm, refer to -
- The reason why we use an unsupervised algorithm like k-Means is because of the data involved. The users’ access pattern training data that we have, doesn’t have an outcome associated with it i.e. the training dataset for a user’s access pattern doesn’t have an outcome if it is an authentic or a fraud attempt.
- This custom risk rule will always evaluate to true unless 30 days’ user behavioral data is collected. Once 30 days’ data is collected, we use SQL queries at Vertica database to find and analyse good and bad clusters and based on the cluster assigned to the test data, the risk rule either evaluates to true or false. Refer to -
to find a list of different ML algorithms supported by Vertica.
This solution is a starting initiative to showcase Vertica integration with NAM. Also, the machine learning algorithm that we have used for our use-case is just for demonstration, one can use any algorithm that suits their use-cases. Moving forward, we can add multiple other parameters to the same use-case such as classification of applications based on importance. Besides, this is just one use-case that we have demonstrated, UBA can be brought to use to use-cases like detecting brute-force attacks, detecting compromised credentials or profiles, detecting insider threats, detecting breach of classified and protected data, predicting the license usage of a product, etc.