About

This is the site of the special chair of text-mining from the department of data science and knowledge engineering (DKE) at the faculty of science and engineering from of the Maastricht University. The extraordinary chair is sponsored by ZyLAB Technologies and hold by Johannes (Jan) C. Scholtes since 2008.

On this site, you can find blogs on various topics related to text mining, artificial intelligence and deep learning. There are also downloads for PhD & MSc student projects, publications and various educational material.

Text mining generally refers to the process of extracting interesting and non-trivial information and knowledge from unstructured text. Information extraction varies from extracting simple semantic entities to advanced patterns such as facts, events, sentiments, emotions or even more abstract concepts.

Using document classification algorithms helps to organize large document collections and provide the users with a high-level overview and organize the documents in such a way that the answers to research or investigative questions is more or less clear form the organization and presentation of the data to the investigator.

Text mining encompasses several computer science disciplines with a strong orientation towards artificial intelligence and data sciences in general, including but not limited to pattern recognition, neural networks, natural language processing, information retrieval, clustering and machine learning. An important difference with standard information retrieval techniques is that information retrieval they requires a user to know what he or she is looking for, while text mining attempts to discover information in a pattern that is not known beforehand. This is very relevant, for example, in scientific research, criminal & internal investigations, legal discovery, case law research, (business) intelligence, clinical research, or due diligence investigations. As a result, text-mining techniques allow one to find more relevant documents without suffering from too many non-relevant search results.