Carrot2

Upgrade Nutch plugin to use the 3.x release

Details

  • Type: Task Task
  • Status: Resolved Resolved
  • Priority: Major Major
  • Resolution: Invalid
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None

Description

Related issue on Apache JIRA: https://issues.apache.org/jira/browse/NUTCH-673

  1. Clusterer.java
    29/Mar/11 11:57 AM
    6 kB
    Stanisław Osiński
  2. HitsClusterAdapter.java
    29/Mar/11 11:57 AM
    3 kB
    Stanisław Osiński
  3. TestClusterer.java
    29/Mar/11 11:57 AM
    6 kB
    Stanisław Osiński

Issue Links

Activity

Hide
Stanisław Osiński added a comment -

While we're waiting for Lucene 2.9.1 to come out, maybe we would be able to handle this for 3.1.1?

Show
Stanisław Osiński added a comment - While we're waiting for Lucene 2.9.1 to come out, maybe we would be able to handle this for 3.1.1?
Hide
Dawid Weiss added a comment -

Investigated the possibilities here.

Nutch still has Lucene 2.9.x, whereas we use Lucene 3.0.0. Also, there will be a bunch of other libraries required to add Carrot2 3.0+ to Nutch, some of them heavy (Mahout, google collections, etc.). I don't know if Nutch folks will appreciate this much.

What do you think – should be try, or leave Nutch with 2.x line?

Show
Dawid Weiss added a comment - Investigated the possibilities here. Nutch still has Lucene 2.9.x, whereas we use Lucene 3.0.0. Also, there will be a bunch of other libraries required to add Carrot2 3.0+ to Nutch, some of them heavy (Mahout, google collections, etc.). I don't know if Nutch folks will appreciate this much. What do you think – should be try, or leave Nutch with 2.x line?
Hide
Stanisław Osiński added a comment -

I think the extra libraries wouldn't be more than 1 or 2 MB together, right? So the biggest problem seems Lucene – maybe we could schedule this at a point when Lucene is upgraded in Nutch? After all, upgrading from 2.9.x to 3.0.0 is only a matter of fixing deprecations. I don't see a relevant issue in Nutch's JIRA though.

Show
Stanisław Osiński added a comment - I think the extra libraries wouldn't be more than 1 or 2 MB together, right? So the biggest problem seems Lucene – maybe we could schedule this at a point when Lucene is upgraded in Nutch? After all, upgrading from 2.9.x to 3.0.0 is only a matter of fixing deprecations. I don't see a relevant issue in Nutch's JIRA though.
Hide
Dawid Weiss added a comment -

Older Lucene (2.9) is a show-stopper for this, unfortunately. There are API incompatibilities that cause exceptions at runtime. I'll file an issue with Nutch, perhaps they'll wish to upgrade and then we can proceed.

Show
Dawid Weiss added a comment - Older Lucene (2.9) is a show-stopper for this, unfortunately. There are API incompatibilities that cause exceptions at runtime. I'll file an issue with Nutch, perhaps they'll wish to upgrade and then we can proceed.
Hide
Dawid Weiss added a comment -
Show
Dawid Weiss added a comment - Equivalent issue in Nutch: https://issues.apache.org/jira/browse/NUTCH-673
Hide
Stanisław Osiński added a comment -

We need to wait until Nutch upgrades to Lucene 3.0. Moving to 3.3.0 for the time being.

Show
Stanisław Osiński added a comment - We need to wait until Nutch upgrades to Lucene 3.0. Moving to 3.3.0 for the time being.
Hide
Dawid Weiss added a comment -

Will upgrade after we release 3.4.0.

Show
Dawid Weiss added a comment - Will upgrade after we release 3.4.0.
Hide
Stanisław Osiński added a comment -

Some rough-cuts Nutch integration code for Carrot2 3.x I once prepared for a client.

Show
Stanisław Osiński added a comment - Some rough-cuts Nutch integration code for Carrot2 3.x I once prepared for a client.
Hide
Dawid Weiss added a comment -

Nutch doesn't come with a frontend anymore. Clustering plugin has been removed (and exists Solr which can be used as the sink from Nutch's crawls).

Show
Dawid Weiss added a comment - Nutch doesn't come with a frontend anymore. Clustering plugin has been removed (and exists Solr which can be used as the sink from Nutch's crawls).

People

Vote (2)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: