A Guide to Text Analytics From a Leader in the Industry

by in Information Management & Governance

A Leader in Text Analytics with a Uniquely Holistic Analytics Heritage


Why is the Forrester Wave so Influential

The Forrester WaveTm is a guide for organizations weighing their technology procurement options. Forrester uses a transparent publicly available methodology to compare leading market players so decision-makers can make well-informed choices without spending months conducting their own research. The principal inputs in a Forrester Wave evaluation are an executive strategy briefing, a product demo session, a questionnaire, and customer references.

Micro Focus: A Leader in Text Analytics

Micro Focus was among select companies Forrester invited to participate in its Q2 2018, Forrester WaveTm evaluation of AI-based text analytics platforms. In this evaluation, Micro Focus was cited as a Leader in text analytics. Click here to download the Forrester WaveTm: AI-Based Text Analytics Platforms.

A Guide to Text Analytics From a Leader in the Industry.pngOne of the conclusions stated in the Forrester Wave report is one that echoes an increasingly critical requirement for an effective analytics practice. 

“Micro Focus IDOL helps clients analyze all unstructured data, not just text. Unstructured data isn't just text, it's also audio, images, and video — and Micro Focus IDOL is the only platform we evaluated that offers comprehensive analytics on all unstructured data types1

As enterprise data comes in diverse formats and from varied sources, organizations are constantly in an uphill battle of extracting insights from not just structured data but also unstructured types such as text, image, video, and audio. Therefore, due to an inability to stay on top of this rapidly growing and varied data sets, the quality of decisions businesses make is compromised by incomplete information.

According to a Cisco white paper published in 2017,“Globally, IP video traffic will be 82 percent of all IP traffic (both business and consumer) by 2021, up from 73 percent in 2016.”

As rich media data continues to dominate and increase its share in the overall data landscape, organizations need to unlock this massive wealth of information by tapping into ALL unstructured data regardless of formats.

Micro Focus’ uniquely holistic approach to analytics has been adopted by many organizations operating in extremely time-sensitive, high pressure, and competitive environments such as Spain’s Ministry of the Interior and Xi’an Panorama Data. To see how unified unstructured data analytics works, request a live demo.

Importance of Text Analytics

What is Text Analytics

Text analytics (also referred to as text mining, text mining analytics, or unstructured text analytics) is a computer science discipline that leverages natural language processing (NLP) and machine learning to extract meaning, insights, and analysis from large sets of unstructured text data. The unstructured data could take the form of call center logs, customer emails, corporate documents, consumer surveys, and social media comments. 

Why is Text Analytics Important

Text analytics identifies facts, assertions, concepts, relationships, and keywords that would otherwise be invisible during a more uncoordinated and/or superficial view of such data. The structured data that results from text mining analytics can thereafter be integrated into business intelligence dashboards, data warehouses, and databases and subsequently used for descriptive, prescriptive, or predictive analytics. 

The 7 Basic Functions of Text analytics

The core functions of text analytics include the following. 

1. Language Identification

Identify the language the text is in. Is it English? Japanese? Mandarin? Hindi? Arabic? Spanish? German? Portuguese? Every language has its own idiosyncrasies so clearly knowing what you are dealing with from the get-go is paramount. Language identification is the foundation for every other element of unstructured data analytics so getting it right is critical.

2. Tokenization

Tokenization is the process of breaking a set of text data into blocks that are small enough for machine interpretation. The term token is used because the text doesn’t just comprise words but also hyperlinks and punctuation. Each language has its own unique tokenization requirements. Alphabetic languages such as English are fairly easy to tokenize. It’s much harder to tokenize character-based languages such as Japanese and Mandarin. 

3. Sentence Breaking

Determine where sentences end. For example, certain punctuation marks (such as a full stop or question mark) indicate the end of a sentence in alphabetic languages like English. It’s not always that straightforward though since there’s no guarantee that the text will adhere to conventional punctuation (e.g. social media posts where users may disregard punctuation to fit the text within the platforms character limit). 

4. Part of Speech (PoS) Tagging

PoS tagging is the process of determining and tagging the part of speech of each token within a document. PoS tagging can tell whether a token represents an adjective, a verb, a common noun, a proper noun, or something else. 

5. Chunking

Chunking encompasses a wide range of sentence-breaking tools that splinter a sentence into its constituent phrases. Effectively, chunking is the assignment of PoS-tagged tokens to phrases (such as verb phrases, noun phrases, and prepositional phrases). Chunking is sometimes referred to as light parsing. 

6. Syntax Parsing

Syntax parsing determines sentence structure. It’s an essential building block for sentiment analysis and other NLP features. It’s one of the most computing-intensive steps in text analytics. 

7. Sentence Chaining

Sentence chaining is a process of connecting related sentences by their strength of association to a specific topic. Sentence chaining is also known as sentence relation. It detects overarching commonalities between sentences even when they are many paragraphs apart. Sentence chaining facilitates the execution of complex analyses.

How Micro Focus Can Help Your Business Intelligence

Text mining has a wide range of applications in today’s business. Perhaps the most widely used is Voice of Customer.

By mining text from call center systems, social networking sites, online reviews, and other data sources, businesses can uncover sentiments, trends, patterns and relationships from within the massive amount of data which is diverse in terms of both format and origin. This information can be used to correct product problems, enhance customer service, and strategize new marketing campaigns. Text mining analytics may also be applied to job candidate screening, spam blocking, web content classification, flagging fraudulent insurance claims, and disease diagnosis.

Micro Focus IDOL enables organizations to gain the competitive edge by simplifying the access to disparate data sources and automating the integrated analyses of text, video, image and audio to yield fast and comprehensive insights. 

Learn more

Micro Focus text analytics aggregates and sifts through vast and varied data with speed. It efficiently classifies information in real-time into logically identical concept clusters, thus accelerating productivity. To learn more about how Micro Focus text analytics can take your business intelligence and strategic decision-making to the next level, go to our page and fill out this form.  


1: Source: The Forrester WaveTm: AI-Based Text Analytics Platforms, Q2 2018, Forrester Research, Inc. June 14, 2018.


Artificial Intelligence