Having problems with your account or logging in?
A lot of changes are happening in the community right now. Some may affect you. READ MORE HERE

Dynamic Corpus Functionality, a unique approach to ingesting data

Micro Focus Contributor
Micro Focus Contributor
0 0 1,138

Micro Focus IDOL has just released some fantastic functionality for empowering the act of data collection.  We have added Artificial Intelligence (AI) at the point of ingest to allow decisions to be made in real-time, on how and what data is to be collected.  This provides the ability to gather data in a way previously unavailable to the data analyst which I will now explain through an analogy on gold mining.

Dynamic Corpus Functionality.jpgThere are two ways to mine gold, Open Cast where you dig out a large amount of rock, hoping it contains gold ore and then in a post-process filter the valuable gold from a vast amount of waste, or you can follow the seams of gold with tunnels and only extract valuable gold bearing rock.  The first process up to a certain size is more efficient but has a distinct limitation in how deep it can go, Bingham Canyon in the USA is the world’s deepest at 1.2 Kilometers (0.75 Miles).  The tunneling method although more costly is precise in nature and can follow with confidence gold to a much greater depth.  AngloGold Ashanti’s Mponeng mine in South Africa reaches an incredible depth of 3.9 Kilometers (2.4 Miles), over three times the depth of Open Cast mining.   If you exchange the concept of gold for data (quite apposite in today’s world), open cast is the same as setting up ingest against your repository and needing to specify exact what level to ingest too.  For Enterprise repositories this may be possible but for others such as websites the complete size may be unknown especially in the case of the Dark Web.  Using our new Dynamic Corpus functionality (DCF), as each document is ingested powerful AI decides in real-time how pertinent it is, the document is ingest or reject accordingly.  Any links contained within the document are also followed but if DCF decides that path is not producing further pertinent data the crawling of that path will be terminated.  In this way no predefinition of depth of crawl of a repository is required and all paths are explore to discover relevant documents.

Another great benefit of DCF is that only high quality data is brought back.  You therefore don’t waste valuable space on irrelevant data and any subsequent analysis is not subject to high levels of noise distorting the results. 

A final but important benefit of DCF comes from using its AI abilities to identify material that matches certain criteria and then take action to not ingest it.  This ability can specifically address issues associated with GDPR by blocking ingest of Personnel Identifiable Information (PII) at the source. PII such as medical records, financial information and even images of passports can be handled with safety. Furthermore, content of sexual, violent or malicious nature can be blocked or placed in a secure index, to protect staff sensitivity.

In summary DCF brings you the ability to perform “Search in Place”!  Discover more about best-in-class cognitive search and discovery with Micro Focus IDOL.

Tags (4)
About the Author
Micro Focus CTO IDOL
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.