Everything is a nail! Machine Learning in CyberSecurity

by in Security

(This blog was originally published in June 2018. In February 2019, Micro Focus acquired Interset)

Last week, we had the privilege of engaging with a select group of individuals at the IANS Forum in Seattle. During the two-day conference, Interset Director of Field Operations Jay Lillie discussed how to assess the effectiveness of different types of machine learning in cybersecurity.

Though most everyone in cybersecurity claims support for AI and machine learning, Jay explained that there are key differences between supervised machine learning (learning by example) and unsupervised machine learning (learning by observation). Machine learning is akin to assessing human emotions via training of previously identified visual clues, as seen in Figure A.

Machine Learning in CyberSecurity1.png

 Figure A

In contrast, unsupervised machine learning is more dynamic. It’s based on learning by observation and leverages contextual clues to statistically determine when certain emotions are likely to manifest. Instead of relying on predefined labels from a series of images, unsupervised machine learning uses contextual data to recognize patterns within facial images and identify the expected emotion. 

Machine Learning in CyberSecurity2.png

Figure B

Finding the right tool for the job is the key to a successful machine learning solution in cybersecurity. Jay also shared some key advice on how to select a good solution: Define your needs, demand vendor clarity, use what you know, and evaluate the solutions in the context of your specific needs.  

Machine Learning in CyberSecurity3.png


Machine Learning in CyberSecurity4.png


View the full keynote and tech session presentations on SlideShare.

Following the tech sessions, there was a good discussion with questions from attendees, which are captured below.


Q: Are you only looking at people and their behaviors or also servers and machines?

A: Interset continuously measures “unique normal” and inter-entity relationships for multiple types of entities, such as users, machines, files, IP addresses, projects, resources, services, shares, websites, volumes, and printers. See our behavioral analytics page for more information on how we measure “unique normal” for these 11 types of entities.

Q: How do you reset baseline when an employee changes roles or title?

A: There is no need to reset or “tune” our platform when organizational changes, such as changing roles or changing populations, happen. Inteset uses unsupervised machine learning, a type of machine learning that can discover patterns without labels that enable self-learning through observation. And all of our models learn “online,” which means that they analyze live datasets in real time. This means that our platform will dynamically and automatically adapt to organizational changes.

Q: How do you approach compromised accounts?

A: Compromised accounts result in anomalous behavior, or behavior different than what is expected from the account owner. Interset is able to detect and connect anomalies in authentication, account type, login access to unusual locations (servers, file shares, etc.), active or remote sessions, and so on, and connect these anomalies to potentially compromised accounts, machines, etc.

Q: How do you integrate with other products, such as Splunk, so they don’t have to duplicate data?

A: Interset ingests metadata generated from log files from different sources (Splunk, Active Directory, Endpoint, Fileshare, SIEM, etc.) to be analyzed with its machine learning models. Interset does not duplicate data to run analysis.

Q: Can the product be used to retain log file data? Essentially to serve as a reference copy in case source log files are modified or removed?

The Interset product is not designed to be a repository for log file data, although the Hadoop infrastructure can be a very efficient big data storage option for log file retention. Interset is focused big data computation (as opposed to storage) to quickly and efficiently analyze vast volumes of security data to detect the threats that matter.

Q: How do you deal with user privacy issues, particularly GDPR?

The Interset threat detection platform was designed from the ground up to enable privacy due to the large volume of data being processed. Pseudonymization using secured, one-way hashing of sensitive fields is a built-in component of Interset’s platform. Most importantly, Interset’s hundreds of models were all designed to follow the principle of data minimization—all Interset models require only the minimum set of columns required for statistical processing, and columns not required for models are always optional. Interset’s R&D team also follows secure development lifecycle practices, including security architecture, independent, third-party security testing, and code analysis. For more information, read this Interset and GDPR article.

Q: Is there an ability to anonymize user data in a way that demonstrates users have not been targeted or singled-out?

A: Yes, anonymization is performed as part of data ingest. The specific user information can be masked and stored separately. Should a threat be detected, that specific user data can be retrieved and then unmasked.