Carrot2

Bug in document counting in STCTree.

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 3.0.1
  • Fix Version/s: 3.1.0
  • Component/s: Clustering Algorithms
  • Labels:
    None

Description

for example, there are two sentences of "cat ate cheese too" and "mouse ate cheese too". when I construct the GST. it works correctly.
but I found a bug when i call PhraseNode.getInternalDocumentsRepresentation() . the rusult of this function is wrong.
after analysising the code of STC, I found that the PhraseNode.docs.set() is called when the node is created.
however, after the node was created, when a new ISuffixableElement is added to the GST. the bug occured. the existed node of"ate cheese too EOS" would not add the second document "mouse ate cheese too" traverse.
the reason would be in the source code of "org.carrot2.text.suffixtrees.SuffixTree". When calling insertPrefix(), if a edge is found ,it will break directly.the node's propertis of "docs" and "elementsInNode" has not be changed.

please check and assure it . I have been blocked for days in integerated it into my experments.
thanks you .

Activity

Hide
Stanisław Osiński added a comment -

Assigning to Dawid for investigation and a possible fix.

Show
Stanisław Osiński added a comment - Assigning to Dawid for investigation and a possible fix.
Hide
Dawid Weiss added a comment -

Can you provide a patch? It would be easier to see where the bug actually is.

Show
Dawid Weiss added a comment - Can you provide a patch? It would be easier to see where the bug actually is.
Hide
Dawid Weiss added a comment -

I changed the title of this bug. I think I know where the bug is, will commit a test case with a patch as soon as I figure out how to fix it.

Show
Dawid Weiss added a comment - I changed the title of this bug. I think I know where the bug is, will commit a test case with a patch as soon as I figure out how to fix it.
Hide
Dawid Weiss added a comment -

Ok, I think I've identified the problem. I will attach a patch here, but have to analyze what the side-effects are exactly, so the commit will be delayed a bit.

Show
Dawid Weiss added a comment - Ok, I think I've identified the problem. I will attach a patch here, but have to analyze what the side-effects are exactly, so the commit will be delayed a bit.
Hide
Dawid Weiss added a comment -

Initial fix for the number of documents in the GST.

Show
Dawid Weiss added a comment - Initial fix for the number of documents in the GST.
Hide
Dawid Weiss added a comment -

Fixed in trunk.

Show
Dawid Weiss added a comment - Fixed in trunk.
Hide
ruilong yang added a comment -

I am not familiar with the format of the Patch .how can I apply this patch to my program.

Show
ruilong yang added a comment - I am not familiar with the format of the Patch .how can I apply this patch to my program.
Hide
Dawid Weiss added a comment -

Just retrieve the most recent version of Carrot2 from the repository, it's been integrated in the code already.

Show
Dawid Weiss added a comment - Just retrieve the most recent version of Carrot2 from the repository, it's been integrated in the code already.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: