2 min read time

Detecting plagiarism using OpenText Analytics (IDOL): The Role of Document Fingerprinting

by   in Unstructured Data Analytics

There is quite a bit of talk about plagiarism in the news, especially in academia. The challenge of ensuring originality in academic, literary, and professional compositions is more pressing than ever. This is where "Detecting Plagiarism Using OpenText Analytics (IDOL)" steps into the limelight, offering a compelling exploration of how advanced technology can be harnessed to safeguard intellectual property. At the heart of this innovative approach is the concept of document fingerprinting, a technological marvel that forms the bedrock of OpenText Analytics IDOL's plagiarism detection capabilities.

Document fingerprinting

Document fingerprinting is a technique used in computer science, particularly in the field of information security and data analysis. It involves creating a unique identifier or "fingerprint" for a document or a set of documents. This fingerprint is usually a concise representation that captures the essence of the document's content. The concept is analogous to human fingerprints, which uniquely identify individuals.

When documents are ingested into OpenText Analytics (IDOL), the fingerprint_string lua function generates a list of fingerprints from the content of the document. Examples of fingerprints on document content are shown in the image below.

Document fingerprints generated on Wikipedia content

Document fingerprints generated on Wikipedia content


Once fingerprints of the original works are created, it simplifies the process of verifying if the content in question contains plagiarism. This involves creating fingerprints for the content under review and then comparing these against a database indexed with the original works. If there is a match, indicating plagiarism, the content can then be categorized as plagiarized.

To see a demonstration of this function, please watch the following video.

Content acquisition

The content acquisition can happen using OpenText Analytics' Connectors, which can acquire content from 82 different content repositories. Content can also be OCR from old scanned PDFs or even images so as to provide a holistic search capability.

For a deeper understanding of these advancements or to talk about potential collaborations, please feel free to explore Opentext's overview on AI text analytics or drop me an email at vjoseph@opentext.com.

More Information 

Learn more about what Unstructured Data Analytics can do for you.

Join OpenText on LinkedIn and follow @OpenText on X.

We’d love to hear your thoughts on this blog. Comment below.

The OpenText Analytics & AI team


Unstructured Data Analytics