The Data Mountain
As the amount of data in the world increases exponentially, it can be increasingly difficult to get a handle on what you have, let alone how you might use it. It’s hard enough for an individual to keep track of several years of social media, email, and photos… but what about big companies? With hundreds or even thousands of users generating content, using their own filing and organization methods, combined with employee turnover resulting in a loss of this institutional knowledge, there is an overwhelming mountain of data.
Some of it is usefully structured and searchable; the databases, the accounts. These have been carefully curated to make it easy to manage, and we rarely have difficulty finding what we need. The rest, however, is the kind of messy, unstructured data that rapidly accumulates—either because you are legally required to keep it, or because it might come in useful one day, or simply because storage is cheap and you don’t bother to delete it.
One person, devoting their entire life to the effort, would struggle to get a handle on it all, even if data production stopped today. Take into account that you are chasing a moving target, and it becomes impossible.
More importantly, it isn’t necessary for people to do this work. Most of the data mountain is the kind of dull, everyday content that can safely be ignored. We don’t need to reread a meeting invite from 2003, or look at thousands of hours of security footage of an empty corridor.
But in the midst of all that, there’s the ten minutes of security footage that shows someone breaking in to the secure room, or an email that breaks insider trading regulations.
So how do you flag the important things, while consigning the ordinary to the digital equivalent of the basement archives?
Enter Micro Focus IDOL
Micro Focus IDOL is a suite of products designed around a simple philosophy:
We want you to be able to find and use the valuable data that you collect and store, with as little manual input as possible.
IDOL is not a single product; rather it is this unifying principle. Under this umbrella, there are four core areas that make up the Micro Focus IDOL product suite:
- IDOL Text Analytics – search, analytics, and data enrichment for unstructured text sources.
- IDOL Rich Media Analytics – search, analytics, and data enrichment for image, audio, and video sources.
- IDOL KeyView – file format detection, text extraction, and rendering
- IDOL Ingest Chain – data collection and enrichment
Each of these areas forms a part of the whole, which you can use individually or in combination to solve a variety of business problems. The power of IDOL is that you can mix and match compatible products in different groupings in a way that suits the problems you need to solve.
IDOL Text Analytics
Text Analytics can start with a simple keyword search, but it also includes many different tools to help you make the most of your unstructured text data.
A few commonly-used examples include:
- Entity Extraction – IDOL Eduction allows you to extract valuable snippets of information (entities) from your text and use it to tag documents. For example, you can use this to find Personally Identifiable Information (PII) in your documents, to ensure your compliance with regulations such as GDPR.
- Query Analytics – IDOL can take a simple search and provide insight into your data. The Find application provides visualizations so you can see at a glance the kind of results you have for a particular search, such as topic maps and timelines. You can also perform search comparisons, or create a geographical map of your results.
- Virtual Assistant – IDOL Natural Language Question Answering provides a simple, automated tool to help your customers with common problems, reserving your support staff for the more unusual requests.
Importantly, IDOL also provides document security to ensure your document access restrictions remain in place in the search results and analytics in the same way as in your original repositories.
IDOL Rich Media Analytics
Rich Media analytics provides tools for making the most of your images, videos, and audio files. Rich Media can process media from streams (such as broadcasts, or security cameras providing continuous content) or discrete files, and it can perform analytics such as text capture from images (OCR), face detection and recognition (finding and identifying faces in images), object recognition (such as logo detection), speech-to-text, and speaker recognition.
It has a wide variety of applications, including:
- Broadcast Monitoring – retrieving and analysing content from ongoing broadcasts to find salient news stories, and process content for analysis. You can use this for many things, from keeping track of developing news stories, to checking how often a brand logo appears in a broadcast.
- Security and Surveillance – automating security systems to detect particular events, such as abandoned luggage or traffic infractions, to augment human oversight and reduce human error.
- Personal Data Protection – finding PII in media such as text in images, faces, or number plates, and redacting it.
On the surface, file format detection and text extraction does not sound particularly glamorous, but in practise KeyView is one of the workaday engines that makes a lot of IDOL’s most powerful functionality possible.
KeyView can detect and categorize over 1200 file formats, which allows for appropriate routing for different types of files. It detects the format by using the file content, which is more accurate than using the file extension, and it can often detect different versions of a format, which might require different processing.
In addition, KeyView can extract subfiles from a variety of file formats, and filter the text from hundreds more. Text filtering allows you to create an IDOL index from your raw data. Moreover, KeyView supports many old file formats that no longer have a native viewer, allowing you to recover otherwise inaccessible content.
You can also use KeyView to export files to XML, HTML, or PDF, and to render a document for easy viewing in a Web browser.
IDOL Ingest Chain
The IDOL Ingest Chain is another important working part that makes the rest of IDOL possible, used to retrieve and enrich your content.
IDOL Connectors allow you to retrieve data from over a hundred repositories, which you can then route to KeyView for file and text extraction, and then onwards to other IDOL tools for data enrichment or indexing.
IDOL NiFi Ingest provides a front-end application, based on Apache NiFi, to allow you to easily visualize complex ingest chains and manage your document flow.
All businesses have unique problems, and IDOL provides building blocks that you can connect together in a unique way to solve them. IDOL allows you to make the most of all your valuable data.