Details
-
Type:
Improvement
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 2.0
-
Fix Version/s: 2.1
-
Component/s: Clustering Algorithms
-
Labels:None
Description
Occasionally it happens that search results contain snippets in different languages, which, combined with inaccurate language recognition, can lead to meaningless cluster labels consisting of e.g. stop words. Applying all known stop lists (and not only the one for the recognized language) would fix the problem. Stop word list merging could be implemented on the level of the tokenizer component, and in this way all clustering algorithms using the component would benefit.
Switched Lingo to multilingual mode, which should solve the problem.