Uploaded image for project: 'Carrot2'
  1. Carrot2
  2. CARROT-1081

BisectingKMeansClusteringAlgorithm document assignment bug

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.9.4
    • Fix Version/s: 3.9.5, 3.10.0
    • Component/s: Clustering Algorithms
    • Labels:
      None

      Description

      As reported by Sergio Queiroz:

      I was running BisectingKMeansClusteringAlgorithm on some documents and saw that, contrary to the expected, it did not returned hard clusterings, i.e., the same document appeared in multiple clusters. I looked at the code and figured out that the problem was likely due to a bug in line 442, where starts the block:
      
      if (it < iterations - 1)
      {
          previousResult = result;
          result = Lists.newArrayList();
          for (int i = 0; i < partitions; i++)
          {
              result.add(new IntArrayList(selected.columns()));
          }
      }
      
      This condition caused that the result list is not initialized anew in the last iteration, so that the last iteration adds elements to the partitions of the iteration before it. I removed the "if" (so that the code inside the if executed for all iterations) and the algorithm started behaving as expected.
      

      I believe Sergio's insight is correct – I looked at the code and can't find the reason for the 'if' to be there.

        Attachments

          Activity

            People

            Assignee:
            dweiss Dawid Weiss
            Reporter:
            dweiss Dawid Weiss
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: