Carrot2

Random snippet sorting order in the Demo Browser?

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 2.0
  • Fix Version/s: 2.1
  • Component/s: None
  • Labels:
    None

Description

It looks like the Demo browser shows snippets in clusters is a more or less random order Had a brief look at the code, but I guess this needs some more serious debugging.

Activity

Hide
Dawid Weiss added a comment -

Can you provide an example of this random order? The order should be identical as the addition of documents (their fetch from the source); this is in fact in:

public StringBuffer appendHtmlForDocuments(StringBuffer buffer, Collection documents, boolean resortDocuments) {
if (resortDocuments) { // Re-sort documents according to their addition sequence. final ArrayList docsCopy = new ArrayList(documents); Collections.sort(docsCopy, DOCUMENT_SEQ_COMPARATOR); documents = docsCopy; }

so documents are re-sorted according to their sequential number at the input. Perhaps one of your processes does not include the RawDocumentEnumerator in the pipeline?

Show
Dawid Weiss added a comment - Can you provide an example of this random order? The order should be identical as the addition of documents (their fetch from the source); this is in fact in: public StringBuffer appendHtmlForDocuments(StringBuffer buffer, Collection documents, boolean resortDocuments) { if (resortDocuments) { // Re-sort documents according to their addition sequence. final ArrayList docsCopy = new ArrayList(documents); Collections.sort(docsCopy, DOCUMENT_SEQ_COMPARATOR); documents = docsCopy; } so documents are re-sorted according to their sequential number at the input. Perhaps one of your processes does not include the RawDocumentEnumerator in the pipeline?
Hide
Stanisław Osiński added a comment -

Checking the process descriptor was the first thing I done after looking at the code But now I know what's going on – the documents are sorted only if the cluster being shown has subclusters (I remember we discussed this at some point), which is good, but inconsistent with the order of documents shown for clusters not having subclusters (which is the original order returned by the clustering algorithm).

Until we start working on a new RCP-based browser, I'd suggest to sort documents regardless of whether the cluster has subclusters or not. And when we have an RCP-based browser, we could make it configurable: 1) always sort (suggested default), or 2) preserve clustering algorithm's order (and show e.g. depth first order for hierarchical clusters).

Show
Stanisław Osiński added a comment - Checking the process descriptor was the first thing I done after looking at the code But now I know what's going on – the documents are sorted only if the cluster being shown has subclusters (I remember we discussed this at some point), which is good, but inconsistent with the order of documents shown for clusters not having subclusters (which is the original order returned by the clustering algorithm). Until we start working on a new RCP-based browser, I'd suggest to sort documents regardless of whether the cluster has subclusters or not. And when we have an RCP-based browser, we could make it configurable: 1) always sort (suggested default), or 2) preserve clustering algorithm's order (and show e.g. depth first order for hierarchical clusters).
Hide
Stanisław Osiński added a comment -

Fixed – now cluster documents are always sorted, no matter whether the cluster has subclusters or not.

Show
Stanisław Osiński added a comment - Fixed – now cluster documents are always sorted, no matter whether the cluster has subclusters or not.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: