Carrot2

Stop list merging for Lingo

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 2.0
  • Fix Version/s: 2.1
  • Component/s: Clustering Algorithms
  • Labels:
    None

Description

Occasionally it happens that search results contain snippets in different languages, which, combined with inaccurate language recognition, can lead to meaningless cluster labels consisting of e.g. stop words. Applying all known stop lists (and not only the one for the recognized language) would fix the problem. Stop word list merging could be implemented on the level of the tokenizer component, and in this way all clustering algorithms using the component would benefit.

Activity

Hide
Stanisław Osiński added a comment -

Switched Lingo to multilingual mode, which should solve the problem.

Show
Stanisław Osiński added a comment - Switched Lingo to multilingual mode, which should solve the problem.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: