As of version 3.4.0, BaseLanguageModelFactory became the default. That factory produces language models containing identity stemmers, which may result in lower quality of clustering. As a result, Java and C# API examples, such as ClusteringDocumentList, do not override the language model factory and therefore may also produce lower quality clusters.
The following Java API example classes are affected:
The following C# API example classes are affected:
Similarly, any other code that uses Carrot2 Java or C# API without a workaround shown below may produce clusters of lower quality.
Other applications, including Carrot2 Document Clustering Workbench, Carrot2 Document Clustering Server, Carrot2 Web Application, Carrot2 Command Line interface and Solr clustering plugin are not affected by this issue.
The fix is to set the language model factory to DefaultLanguageModelFactory, preferably during the initalization of the controller.
Alternatively, if using component suites and XML configurations for attribute, add the following declaration to the relevant value-set tag: